using the maryland biological stream survey data to test spatial statistical models
DESCRIPTION
Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models. A Collaborative Approach to Analyzing Stream Network Data. Andrew A. Merton. Overview. The material presented here is a subset of the work done by Erin Peterson for her Ph.D. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models](https://reader036.vdocument.in/reader036/viewer/2022062520/568158b1550346895dc5ff5b/html5/thumbnails/1.jpg)
Using the Maryland Using the Maryland Biological Stream Survey Biological Stream Survey
Data to Test Spatial Data to Test Spatial Statistical ModelsStatistical Models
A Collaborative Approach to A Collaborative Approach to Analyzing Stream Network Analyzing Stream Network
DataDataAndrew A. Merton
![Page 2: Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models](https://reader036.vdocument.in/reader036/viewer/2022062520/568158b1550346895dc5ff5b/html5/thumbnails/2.jpg)
OverviewOverview
The material presented here is a subset The material presented here is a subset of the work done by Erin Peterson for of the work done by Erin Peterson for her Ph.D.her Ph.D. Interested in developing geostatistical Interested in developing geostatistical
models for predicting water quality models for predicting water quality characteristics in stream segmentscharacteristics in stream segments
Data: Maryland Biological Stream Survey Data: Maryland Biological Stream Survey (MBSS)(MBSS)
The scope and nature of the problem The scope and nature of the problem requires interdisciplinary collaborationrequires interdisciplinary collaboration
Ecology, geoscience, statistics, others…Ecology, geoscience, statistics, others…
![Page 3: Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models](https://reader036.vdocument.in/reader036/viewer/2022062520/568158b1550346895dc5ff5b/html5/thumbnails/3.jpg)
Stream Network DataStream Network Data
The response data is comprised of The response data is comprised of observations observations withinwithin a stream a stream networknetwork What does it mean to be a “neighbor” in What does it mean to be a “neighbor” in
such a framework?such a framework? How does one characterize the distance How does one characterize the distance
between “neighbors”?between “neighbors”? Should distance measures be confined to Should distance measures be confined to
the stream network?the stream network? Does flow (direction) matter?Does flow (direction) matter?
![Page 4: Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models](https://reader036.vdocument.in/reader036/viewer/2022062520/568158b1550346895dc5ff5b/html5/thumbnails/4.jpg)
Stream Network DataStream Network Data
Potential explanatory variables are not Potential explanatory variables are not restricted to be within the stream restricted to be within the stream networknetwork Topography, soil type, land usage, etc.Topography, soil type, land usage, etc.
How does one sensibly incorporate How does one sensibly incorporate these explanatory variables into the these explanatory variables into the analysis?analysis? Can we develop tools to aggregate Can we develop tools to aggregate
upstream watershed covariates for upstream watershed covariates for subsequent downstream segments?subsequent downstream segments?
![Page 5: Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models](https://reader036.vdocument.in/reader036/viewer/2022062520/568158b1550346895dc5ff5b/html5/thumbnails/5.jpg)
Competing ModelsCompeting Models
Given a collection of competing Given a collection of competing models, how does one select the models, how does one select the “best” model?“best” model? Is one subset of explanatory variables Is one subset of explanatory variables
better or closer to the “true” model?better or closer to the “true” model? Should one assume correlated residuals Should one assume correlated residuals
and, if so, what form should the and, if so, what form should the correlation function take?correlation function take? How does the distance measure impact the How does the distance measure impact the
choice of correlation function?choice of correlation function?
![Page 6: Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models](https://reader036.vdocument.in/reader036/viewer/2022062520/568158b1550346895dc5ff5b/html5/thumbnails/6.jpg)
Functional Distances & Functional Distances & Spatial RelationshipsSpatial Relationships
A
B
C
Straight-line Distance (SLD)Is this an appropriate measure of distance?
Influential continuous landscape variables: geology type or acid rain
(As the crow flies…)
Geostatistical models are based on straight-line
distance
![Page 7: Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models](https://reader036.vdocument.in/reader036/viewer/2022062520/568158b1550346895dc5ff5b/html5/thumbnails/7.jpg)
A
B
C
Distances and relationships are represented differently depending
on the distance measure
Functional Distances & Functional Distances & Spatial RelationshipsSpatial Relationships
Symmetric Hydrologic Distance (SHD)Hydrologic connectivity
(As the fish swims…)
![Page 8: Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models](https://reader036.vdocument.in/reader036/viewer/2022062520/568158b1550346895dc5ff5b/html5/thumbnails/8.jpg)
A
B
C
Distances and relationships are represented differently depending
on the distance measure
Functional Distances & Functional Distances & Spatial RelationshipsSpatial Relationships
Asymmetric Hydrologic Distance (AHD)Longitudinal transport of material
(As the sh*t flows…)
![Page 9: Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models](https://reader036.vdocument.in/reader036/viewer/2022062520/568158b1550346895dc5ff5b/html5/thumbnails/9.jpg)
Candidate ModelsCandidate Models
Restrict the model space to general Restrict the model space to general linear modelslinear models
Look at all possible subsets of Look at all possible subsets of explanatory variables explanatory variables XX (Hoeting et al) (Hoeting et al)
Require a correlation structure that can Require a correlation structure that can accommodate the various distance accommodate the various distance measuresmeasures Could assume that the residuals are spatially Could assume that the residuals are spatially
independent, i.e., independent, i.e., S = S = 22II (probably not best) (probably not best) Ver Hoef et al propose a better solutionVer Hoef et al propose a better solution
),(),(~ 2 XNXNZ
![Page 10: Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models](https://reader036.vdocument.in/reader036/viewer/2022062520/568158b1550346895dc5ff5b/html5/thumbnails/10.jpg)
Asymmetric Autocovariance Asymmetric Autocovariance Models for Stream NetworksModels for Stream Networks
Weighted asymmetric Weighted asymmetric hydrologic distance (WAHD)hydrologic distance (WAHD)
Developed by Jay Ver Hoef, Developed by Jay Ver Hoef, National Marine Mammal National Marine Mammal Laboratory, SeattleLaboratory, Seattle
Moving average modelsMoving average models
Incorporates flow and uses Incorporates flow and uses hydrologic distancehydrologic distance
Represents discontinuity at Represents discontinuity at confluencesconfluences
Flow
![Page 11: Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models](https://reader036.vdocument.in/reader036/viewer/2022062520/568158b1550346895dc5ff5b/html5/thumbnails/11.jpg)
Exponential Correlation StructureExponential Correlation Structure
The exponential correlation function The exponential correlation function can be used for both SLD and SHDcan be used for both SLD and SHD
For AHD, one must multiply For AHD, one must multiply (element- (element-wise) by the weight matrix wise) by the weight matrix AA, i.e., , i.e., ij* = aij ij, hence WAHD
The weights represent the proportion of flow volume that the downstream location receives from the upstream location
Estimating the aij is non-trivial – Need special GIS tools (Theobald et al)
0/exp)1(
01
211,
ijij
ijij
n
jiij hh
h
that such
![Page 12: Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models](https://reader036.vdocument.in/reader036/viewer/2022062520/568158b1550346895dc5ff5b/html5/thumbnails/12.jpg)
GIS ToolsGIS ToolsTheobald et al have created automated tools to extract data about hydrologic relationships
between sample points
Visual Basic for Applications programs that:1. Calculate separation distances between sites SLD, SHD, Asymmetric hydrologic distance
(AHD)2. Calculate watershed covariates for each stream
segment Functional Linkage of Watersheds and Streams
(FLoWS)3. Convert GIS data to a format compatible with
statistics software
1 2
3
1 2
3
SLD
1 2
3
SHD AHD
![Page 13: Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models](https://reader036.vdocument.in/reader036/viewer/2022062520/568158b1550346895dc5ff5b/html5/thumbnails/13.jpg)
Spatial Weights for WAHDProportional influence: influence of each neighboring sample site on a downstream sample site
•Weighted by catchment area: Surrogate for flow
1. Calculate influence of each upstream segment on segment directly downstream
2. Calculate the proportional influence of one sample site on another• Multiply the edge
proportional influences
3. Output:• n×n weighted incidence
matrix
stream confluencestream segment
![Page 14: Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models](https://reader036.vdocument.in/reader036/viewer/2022062520/568158b1550346895dc5ff5b/html5/thumbnails/14.jpg)
Spatial Weights for WAHDProportional influence: influence of each neighboring sample site on a downstream sample site
•Weighted by catchment area: Surrogate for flow
1. Calculate influence of each upstream segment on segment directly downstream
2. Calculate the proportional influence of one sample site on another• Multiply the edge
proportional influences
3. Output:• n×n weighted incidence
matrix
stream confluencestream segment
![Page 15: Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models](https://reader036.vdocument.in/reader036/viewer/2022062520/568158b1550346895dc5ff5b/html5/thumbnails/15.jpg)
Spatial Weights for WAHDProportional influence: influence of each neighboring sample site on a downstream sample site
•Weighted by catchment area: Surrogate for flow
1. Calculate influence of each upstream segment on segment directly downstream
2. Calculate the proportional influence of one sample site on another• Multiply the edge
proportional influences
3. Output:• n×n weighted incidence
matrix
stream confluencestream segment
![Page 16: Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models](https://reader036.vdocument.in/reader036/viewer/2022062520/568158b1550346895dc5ff5b/html5/thumbnails/16.jpg)
Spatial Weights for WAHDProportional influence: influence of each neighboring sample site on a downstream sample site
•Weighted by catchment area: Surrogate for flow
1. Calculate influence of each upstream segment on segment directly downstream
2. Calculate the proportional influence of one sample site on another• Multiply the edge
proportional influences
3. Output:• n×n weighted incidence
matrix
A
BC
DE
F
G
H
survey sitesstream segment
![Page 17: Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models](https://reader036.vdocument.in/reader036/viewer/2022062520/568158b1550346895dc5ff5b/html5/thumbnails/17.jpg)
Spatial Weights for WAHDProportional influence: influence of each neighboring sample site on a downstream sample site
•Weighted by catchment area: Surrogate for flow
1. Calculate influence of each upstream segment on segment directly downstream
2. Calculate the proportional influence of one sample site on another• Multiply the edge
proportional influences
3. Output:• n×n weighted incidence
matrix
A
BC
DE
F
G
H
Site PI = B * D * F * G
![Page 18: Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models](https://reader036.vdocument.in/reader036/viewer/2022062520/568158b1550346895dc5ff5b/html5/thumbnails/18.jpg)
Parameter EstimationParameter Estimation
Maximize the (profile) likelihood to Maximize the (profile) likelihood to obtain estimates for obtain estimates for , , ,, and and 22
ZXXX 111 ')'()(ˆ
n
XZXZ )()'()(ˆ
12
MLE
s
Profile likelihood:
2log
2
1)ˆlog(
2)2log(
2),ˆ,ˆ;( 22 nnnZprofile
![Page 19: Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models](https://reader036.vdocument.in/reader036/viewer/2022062520/568158b1550346895dc5ff5b/html5/thumbnails/19.jpg)
Model SelectionModel Selection
Hoeting et al adapted the Akaike Hoeting et al adapted the Akaike Information Corrected Criterion for Information Corrected Criterion for spatial modelsspatial models AICC estimates the difference between AICC estimates the difference between
the candidate model and the “true” the candidate model and the “true” modelmodel
Select models with small AICCSelect models with small AICC 2
12),,;(2 2
kpn
kpnZprofile AICC
where n is the number of observations, p-1 is the number of covariates, and k is the number of autocorrelation parameters
![Page 20: Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models](https://reader036.vdocument.in/reader036/viewer/2022062520/568158b1550346895dc5ff5b/html5/thumbnails/20.jpg)
Spatial Distribution of MBSS Data
N
![Page 21: Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models](https://reader036.vdocument.in/reader036/viewer/2022062520/568158b1550346895dc5ff5b/html5/thumbnails/21.jpg)
Summary Statistics for Distance Measures
• Distance measure greatly impacts the number of neighboring sites as well as the median, mean, and maximum separation distance between sites
* Asymmetric hydrologic distance is not weighted here
Summary statistics for distance measures in kilometers using DO (n=826).Distance Measure N Pairs Min Median Mean Max
Straight Line Distance 340725 0.05 101.02 118.16 385.53
Symmetric Hydrologic Distance 62625 0.05 156.29 187.10 611.74
Pure Asymmetric * Hydrologic Distance 1117 0.05 4.49 5.83 27.44
![Page 22: Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models](https://reader036.vdocument.in/reader036/viewer/2022062520/568158b1550346895dc5ff5b/html5/thumbnails/22.jpg)
Comparing Distance MeasuresComparing Distance Measures
The “selected” models (one for each The “selected” models (one for each distance measure) were compared by distance measure) were compared by computing the mean square prediction computing the mean square prediction error (MSPE) error (MSPE) GLM: Assumed independent errorsGLM: Assumed independent errors Withheld the same 100 (randomly) Withheld the same 100 (randomly)
selected records from each model fitselected records from each model fit Want MSPE to be smallWant MSPE to be small
p
n
iii
n
ZZMSPE
p
1
2)ˆ(
![Page 23: Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models](https://reader036.vdocument.in/reader036/viewer/2022062520/568158b1550346895dc5ff5b/html5/thumbnails/23.jpg)
ANC
0.00
50000.00
100000.00
150000.00
200000.00
250000.00
300000.00
350000.00
GLM SL SH WAH
COND
0.00
5000.00
10000.00
15000.00
20000.00
25000.00
30000.00
35000.00
40000.00
GLM SL SH WAH
DOC
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
GLM SL SH WAH
DO
0.00
0.50
1.00
1.50
2.00
2.50
GLM SL SH WAH
NO3
0.00
0.20
0.40
0.60
0.80
1.00
1.20
GLM SL SH WAH
SO4
0.00
50.00
100.00
150.00
200.00
250.00
300.00
350.00
400.00
GLM SL SH WAH
TEMP
6.50
7.00
7.50
8.00
8.50
9.00
GLM SL SH WAH
PHLAB
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
0.18
GLM SL SH WAH
MS
PE
GLM
SLD
SHD
WAHD
Comparing Distance MeasuresComparing Distance MeasuresPrediction Performance for Various ResponsesPrediction Performance for Various Responses
![Page 24: Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models](https://reader036.vdocument.in/reader036/viewer/2022062520/568158b1550346895dc5ff5b/html5/thumbnails/24.jpg)
Maps of the Relative WeightsMaps of the Relative Weights
Generated maps by kriging Generated maps by kriging (interpolation)(interpolation) Predicted values are linear combinations Predicted values are linear combinations
of the “observed” data, i.e.,of the “observed” data, i.e.,
1
11
1111
11
11111
11211
1111
11
1112
12
)))(()((
)|(
MZ
ZXXXXIXXXX
ZZETTTT
Z1 is the observed data, Z2 is the predicted value, 11 is the correlation matrix for the observed sites, and is the correlation matrix between the prediction site and the observed sites
![Page 25: Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models](https://reader036.vdocument.in/reader036/viewer/2022062520/568158b1550346895dc5ff5b/html5/thumbnails/25.jpg)
Relative Weights Used to Make Prediction at Site 465
General Linear Model
Symmetric Hydrologic
Straight-line
Weighted Asymmetric Hydrologic
![Page 26: Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models](https://reader036.vdocument.in/reader036/viewer/2022062520/568158b1550346895dc5ff5b/html5/thumbnails/26.jpg)
General Linear Model Straight-line
Symmetric Hydrologic Weighted Asymmetric Hydrologic
Relative Weights Used to Make Prediction at Site 465
![Page 27: Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models](https://reader036.vdocument.in/reader036/viewer/2022062520/568158b1550346895dc5ff5b/html5/thumbnails/27.jpg)
Residual Correlations for Site 465
General Linear Model
Symmetric Hydrologic
Straight-line
Weighted Asymmetric Hydrologic
![Page 28: Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models](https://reader036.vdocument.in/reader036/viewer/2022062520/568158b1550346895dc5ff5b/html5/thumbnails/28.jpg)
General Linear Model Straight-line
Symmetric Hydrologic Weighted Asymmetric Hydrologic
Residual Correlations for Site 465
![Page 29: Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models](https://reader036.vdocument.in/reader036/viewer/2022062520/568158b1550346895dc5ff5b/html5/thumbnails/29.jpg)
Probability-based random survey design• Designed to maximize spatial independence
of survey sites• Does not adequately represent spatial
relationships in stream networks using hydrologic distance measures
Some Comments on the Sampling Design
0 2
244
149133
109
66
38 32
12 7
3519 15 13 6 1 0
0
275
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Fre
quen
cy
Number of Neighboring Sites
244 sites did not have neighbors Sample Size = 881Number of sites with ≥ 1 neighbor: 393Mean number of neighbors per site: 2.81
![Page 30: Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models](https://reader036.vdocument.in/reader036/viewer/2022062520/568158b1550346895dc5ff5b/html5/thumbnails/30.jpg)
Conclusions
A collaborative effort enabled the analysis of a complicated problem Ecology – Posed the problem of interest, Ecology – Posed the problem of interest,
provides insight into variable (model) provides insight into variable (model) selectionselection
Geoscience – Development of powerful tools Geoscience – Development of powerful tools based on GISbased on GIS
Statistics – Development of valid covariance Statistics – Development of valid covariance structures, model selection techniquesstructures, model selection techniques
Others – e.g., very understanding (and Others – e.g., very understanding (and sympathetic) spouses…sympathetic) spouses…