multivariate analysis of variance for stream classification in texas

Multivariate Analysis of Variance for Stream Classification in Texas Eric S. Hersh CE397 – Statistics in Water Resources Term Project Cinco de Mayo, 2009

Upload: lula

Post on 22-Mar-2016

35 views

Category:

Documents

3 download

Report

Download

Tags:

Embed Size (px):

DESCRIPTION

Multivariate Analysis of Variance for Stream Classification in Texas. Eric S. Hersh CE397 – Statistics in Water Resources Term Project Cinco de Mayo, 2009. Can we quantitatively regionalize the streams of Texas?. - PowerPoint PPT Presentation

TRANSCRIPT

Multivariate Analysis of Variancefor

Stream Classification in Texas

Eric S. HershCE397 – Statistics in Water Resources

Term Project

Cinco de Mayo, 2009

Page 2: Multivariate Analysis of Variance for Stream Classification in Texas

Can we quantitatively regionalize the streams of Texas?

Page 3: Multivariate Analysis of Variance for Stream Classification in Texas

East Texas

North-Central Texas

WestTexas

South-CentralTexas

Lower Rio Grande Basin

Hersh, E.S., Maidment, D.R., and W.S. Gordon. “An Integrated Stream Classification System to Support Environmental Flow Analyses in Texas.” J. Am. Water Res. Assoc. Submitted November 2008.

Page 4: Multivariate Analysis of Variance for Stream Classification in Texas

Revisited - the question posed

Can we improve the way in which we perform the regionalization and thus (potentially)

increase its classification strength?

Page 5: Multivariate Analysis of Variance for Stream Classification in Texas

Analysis of VarianceANOVAPurpose: test whether group means are different

MANOVAMultivariate Analysis of Variance

Purpose: ANOVA with several

dependent variables

Page 6: Multivariate Analysis of Variance for Stream Classification in Texas

• Multiple metric dependent variables (n=18)

• Based on categorical (non-metric) independent variables (n=5 regions)

• Manipulate independent variables to determine effect on dependent variables using SAS PROC GLM (general linear model)

Region = DO ± Temp ± TSS ± pH ± Cond ± AirTemp ± Precip ± PET ± MAQ ± MAV ± BFI ± ZeroQ ± IQR ± Slope ± Substrate ± Sand ± Silt ± Clay

The Model

Page 7: Multivariate Analysis of Variance for Stream Classification in Texas

ANOVA MANOVA

= = … =

where:

p = parameter (dependent variable)

k = factor (independent variable)

Page 8: Multivariate Analysis of Variance for Stream Classification in Texas

Data Gaps

• Total number of subbasins in Texas = 205• Number with complete data = 103

Uh oh! This test is going to lose a lot of value. Unless…

• Can we fill in the gaps somehow?

Page 9: Multivariate Analysis of Variance for Stream Classification in Texas

Data Gaps

• Some of the subbasins in Texas have no rivers.

• Many have no gages.

• Many have no WQ sampling stations.– Synthetic data would be difficult and poor.

• But, the MANOVA test requires complete matrices.– Solution: fill in gaps with parameter means

– Dilutes strength of classification (regions tend toward others)

Page 10: Multivariate Analysis of Variance for Stream Classification in Texas

Hypothesis Test• Null Hypothesis: (vectors of) the group means

are equalOf course not! That’s preposterous! There would be no

regionalization!

But… we don’t care.

Page 11: Multivariate Analysis of Variance for Stream Classification in Texas

(PRISM, 1971-2000)

West

East

Page 12: Multivariate Analysis of Variance for Stream Classification in Texas

Evaluating the Model

• Pillai’s trace considered most robust– S.S. Pillai, 1901-1950, Indian mathematician

Page 13: Multivariate Analysis of Variance for Stream Classification in Texas

Revision Methodology1. Identify bordering subbasins

(n=50, but 10 border multiple, so 60 trials total)2. Switch one subbasin, check for increase in test stat,

record and reset (21 deemed beneficial)3. Rank by improvement4. Implement changes in order, discard if decline (18 kept)5. View in geographic context, apply decision rules (no

islands or peninsulas, 15 kept)