rss : software for spatial analysis analysis and visualization of spatial data richard pugh product...
TRANSCRIPT
RSS : Software for Spatial Analysis
Analysis and Visualization
of Spatial DataRichard Pugh
Product SpecialistMathSoft International
Overview
IntroductionS+SpatialStats 1.5S-PLUS for ArcView GIS 1.2EnvironmentalStats for S-PLUS 2.0Working with S-PLUS
Great Interactive Graphics
Powerful S Programming
Language
Complete Set of Statistical Algorithms
Full Interoperability and Deployability
S-PLUSExplore
Model
Visualize
Data Analysis in S-PLUS
A state-of-the-art solution for exploratory data analysis, statistical modeling, and advanced data visualization
Combines the S object-oriented programming language with over 4200 prewritten functions
Offers the most comprehensive set of robust, classical and modern statistical methods available anywhere
S-PLUS
Over 80 2D & 3D Graph Types Fully Object-Oriented Graphics Trellis (Conditional) Plots Dynamic Brush & Spin Linked Plots Embed Data in Graphs Exclude points from curve fits Interactive Plots Multiple Axes Multiple Plots on Graphs Multiple Graphs/Page Tabbed Graph Pages
S-PLUS 2000: Graphics
X Angle = 240 X Angle = 330
X Angle = 60 X Angle = 150
-0.010 -0.009 -0.008 -0.007 -0.006
01
00
20
03
00
40
05
00
60
0
Value
De
nsi
ty
Weight
Basic Statistics ANOVA & Regression GLMs, GAMs and NLMs Non Parametric & Local Regression Multivariate Statistics Robust Methods Survival Analysis Tree Models Quality Control Charts Mixed Effects Models Clustering Bootstrap / Jackknife Smoothing Time Series Power / Sample Size / Design Missing Data Imputation
Comp. 1
Co
mp
. 2
-0.4 -0.2 0.0 0.2 0.4
-0.4
-0.2
0.0
0.2
0.4
1
2
3
4
5
6
78
9
1011
1213
14
15
161718
19
20
21
22
23
24
25
-100 -50 0 50
-10
0-5
00
50
diffgeom
complex
algebrareals
statistics
S-PLUS 2000: Statistics
diffgeom statistics reals complex algebra
0.0
0.3
0.6
Comp. 1
diffgeom statistics reals complex algebra
-0.6
0.2
Comp. 2
reals statistics complex algebra diffgeom
-0.6
0.2
Comp. 3
complex diffgeom reals algebra statistics
-0.4
0.4
Comp. 4
algebra reals complex statistics diffgeom
-0.2
0.6
Comp. 5
C, C++, & FORTRAN object code links
OLE Automation: Server/Client Interaction with UNIX & DOS O/S Active X DDE JAVA
S-PLUS Integration
Component 1
Co
mp
on
en
t 2
-4 -2 0 2 4
-1.0
-0.5
0.0
0.5
1.0
1.5
These two components explain 96.07 % of the point variability.
S+SpatialStats S-PLUS for ArcView GIS EnvironmentalStats for S-
PLUS S+NUOPT S+GARCH S+Wavelets S+SeqTrial
S-PLUS Add-On Modules
Geostatistical Data Spatial Point Patterns Lattice data
S+SpatialStats
S-PLUS for ArcView GIS
Link between S-PLUS and ArcView
Import Data Easily Unparalleled Graphics Superior Analytical Power
Data from Monitoring Networks Display of Probability Distributions Goodness-of-fit Tests Sample Size Calculation Prediction and Tolerance Intervals Risk Assessment Type I singly and multiply censored
data
EnvironmentalStats for S-PLUS
Also called random field data Measurements taken at fixed locations Examples include:
– mineral concentrations in a mine– rainfall recorded at weather
stations Small-scale variation / spatial
correlation– closer sites generally have more
similar data values
Geostatistical Data
Producing Empirical Variograms Fitting Theoretical Variogram Models Exploration for Anisotropy Performing Point and Block Kriging Simulating Geostatistical Data
Analyzing Geostatistical Data
Observations associated with spatial regions Examples:
– remote sensed images (regular)– cancer rates for Washington counties
(irregular) Neighbourhood structure Neighbouring regions may have correlated data
Lattice Data
Defining a neighborhood structure Testing for spatial autocorrelation Fitting spatial linear models Model selection
Analyzing Lattice Data
Locations are the variable of interest Locations of objects in a spatial region Examples:
– trees in a forest– earthquake epicentres
Aim to identify:– spatial randomness– clustering or regularity– models for process
Spatial Point Patterns
Testing for CSR– Nearest-neighbour methods
Intensity estimation K-functions (second order properties) Simulating point process data
Analyzing Spatial Point Patterns
SpatialStats Graphical User Interface
S-PLUS for ArcView GIS:
An ArcView GIS extension
Integrates the powerful statistics, data analysis, and presentation quality graphics capabilities of S-PLUS with the cartographic rendering and data management abilities of ArcView GIS
S-PLUS for ArcView GIS dramatically extends the ArcView analysis charting capabilities
For the first time in ArcView, you get accurate statistical inference which accounts for the spatial dependency pattern
S-PLUS data tables with analyses results can be imported back into ArcView for plotting in a wide range of map projections
Powerful complement to ARC/INFO via data conversion to ArcView GIS formats
S-PLUS for ArcView GIS: Graphics
Import existing S-PLUS Graphs Colour classification plots and pie / bar charts Two Step Graph Wizard with Plot Gallery
– 2D, 3D, Pie, Matrix, Multiple Axis,...– Trellis plots made easy!
Spatial Neighbors builds weights between neighboring polygons
Global Spatial Auto-correlation Indexes Moran’s I & Geary’s C measures
Local Index of Spatial Association Spatial Linear Regression
Model variables selected from themes or S-PLUS data frames
Spatial Statistics Menu
S+EnvironmentalStats
Add-on Module for S-PLUS Monitoring Water, Soil, and Air Use Statistics to Compare
to “Background” and Look for Trends
Lognormal Mixture Density with(mean1=5, cv1=1, mean2=20, cv2=0.5, p.mix=0.5)
Value of Random Variable
Re
lativ
e F
req
ue
ncy
0 10 20 30 40
0.0
0.0
20
.04
0.0
60
.08
EnvironmentalStats Features
Probability Density and Cumulative Density Plots QQ Plots for all Probability Distributions Estimation of Distribution Parameters and Quantiles, and
C.Intervals– Maximum Likelihood and Minimum Variance Unbiased– Method of Moments– L-Moments
Additional Prob. Distributions – Generalized Extreme Value– Lognormal Mixture– 3 Parameter Lognormal
Goodness-of-Fit Tests– Chi-Square– Kolmogorov-Smirnov– PPCC– Shapiro-Wilk– Shapiro-Francia
0.2 0.4 0.6 0.8 1.0 1.2 1.4
0.0
0.5
1.0
1.5
TcCB
Rel
ativ
e F
requ
ency
Histogram of Observed Datawith Fitted Normal Distribution
Order Statistics for TcCB andNormal(mean=0.5985106, sd=0.2836408) Distribution
Cum
ulat
ive
Pro
babi
lity
0.2 0.4 0.6 0.8 1.0 1.2
0.0
0.2
0.4
0.6
0.8
1.0
Empirical CDF for TcCB (solid line)with Fitted Normal CDF (dashed line)
Quantiles of Normal(mean = 0.5985106, sd = 0.2836408)
Qua
ntile
s of
TcC
B
0.0 0.2 0.4 0.6 0.8 1.0 1.2
0.0
0.4
0.8
1.2
Quantile-Quantile Plotwith 0-1 Line
Results of Shapiro-Wilk GOF
HypothesizedDistribution: Normal
Estimated Parameters: mean = 0.5985106sd = 0.2836408
Data: TcCB in epa.94b.tccb.df
Subset With: Area == "Reference"
Sample Size: 47
Test Statistic: W = 0.9179198
Test Statistic Parmeter: n = 47
P-value: 0.002830172
Results of Shapiro-Wilk GOF Test for TcCB
EnvironmentalStats Features
Prediction and Tolerance Intervals Special Nonparametric Hypothesis
Tests for Trend and Shift– Seasonal Kendall’s Tau for Trend– Quantile Test for Shift in Upper Tail
Methods for Type I Singly and Multiply Censored Data
Sample Size and Power Calculations and Plots
Tools for Probabilistic Risk Assessment – Latin Hypercube Sampling – Generate Random Numbers from
Different Distributions With a Specified Rank Correlation
Built-In Data Sets and Extensive Help System
Sample Size (n)
Ha
lf-W
idth
600 800 1000 1200 1400
0.0
25
0.0
30
0.0
35
0.0
40
0.0
45
Half-Width vs. Sample Size for Confidence Interval for p,with Confidence Level = 0.95, and p Hat = 0.5
“The Help System Alone is Worth the Price of Admission”
EnvironmentalStats 2.0 (Beta)
Version 2.0 (in Beta) Has: – Pull-Down Menus– Power and Sample Size for Lognormal
Distribution– Optimal Box-Cox Transformations– Simultaneous Prediction Intervals– Nonparametric von Neumann Test for Serial
Correlation
S-PLUS GIS Users
Natural Resources - Amoco, Commonwealth Edison, Hydro Quebec, Kimberly Clark, Koch Industries, Phillip Morris, Weyerhauser, Willamette Industries...
Marketing - AC Nielsen, Amazon.com, Canada Post, CTB McGraw Hill, Dairy Queen, JD Powers & Associates, McDonalds, Rand Corporation, Readers Digest, Sears Roebuck & Co, Time Warner …
Transportation - Airborne Express, American Airlines, Enterprise Rent A Car, Transport Canada...
Government - Centers for Disease Control, Department of Fisheries and Oceans, DOE, EPA, FAA, FCC, FDA, Federal Housing Administration, IRS, NIH, NIST, NOAA, Social Security Admin, US Air Force, US Forest Service, US Geological Survey, SAPD ... Worldwide – NASA, US EPA, USGS, Centres for Disease Control
UK – NERC - Centre for Ecology and Hydrology – British Geological Survey, British Antarctic Survey, Macauley Land Use Research Institute,
BIOSS, CEFAS, MAFF, Marlab
EnvironmentalStats Users
Government Agencies– EPA, USGS, etc.
Commercial Consultants– CH2M Hill, Exponent
Academics– Environmental Engineering, Biostatistics,
Environmental Health, Mathematics, etc. Students People Outside the Environmental Field!
– Merck– Lockheed Martin
Questions Posed
Point Patterns 1 - Random / Clustered - Intensity Point Patterns 2 - Cross Spectral Analysis Point Patterns 3 – Mark Correlation Functions
Lattice Data 1 – Spatial Regression Methods for Normal Data
Lattice Data 2 – Spatial Regression Methods for Non-Normal Data
Lattice Data 3 – Spatial Smoothing Methods
Geostatistical Data 1 – Variograms and Kriging
Hybrid Patterns 1 – Cross Spectral Analysis Hybrid Patterns 2 – Bayesian Hierarchical Models
?
Live Demo Time!
Writing a Presentation on Spatial Statistics User Input (mostly at a Spatial Conference) 2 Major Advantages …
Geostatistical Data• Variogram plots and boxplots and clouds• Directional variograms and Correlograms for Exploring Anisotrophy• Empirical Variogram Estimation including Robust Methods• Variogram Models including Spherical and Exponential• Ordinary and Universal Kriging• Block and Point Kriging Prediction at arbitrary Location with Standard
Errors• Parametric and Non-parametric Trend Surfaces
Point Patterns• Point Maps that Include Region Boundaries• Spatial Randomness Tests• Ripley’s K-Function• Simultation of Spatial Random Processes• Local Intensity Estimation
Lattice Data• “Binning” of High Density Data into a Regular Lattice of Counts• Geary and Moran Spatial Autocorrelation coefficients• Spatial Regression Models including Conditional and Simultaneous
Autoregressive Models• Nearest Neighbour Search• Visualisation of Neighbour Structures
1) S-PLUS GIS Toolbox
2) It’s in S-PLUS!
Advanced Graphics• Exclusive Trellis Graphics• 3D Plotting and Spinning• Contour Plots• Overlaying Plots• Brush and Spin Environment• Export to Large Number of
Formats• Java Graphlets• Imaging Plots• Hexagonal Binning
“S” Language• Powerful Language• Excel Integration• Call from ArcView with Link• Full Visability and Customisation• C, C++, Fortran and Java
Connectivity• 100,000 + User Community
Statistics• Cluster Analysis• Tree Models• Advanced Regression• Data Mining Tools• Linear and Non-Linear Mixed
Effects• Missing Data