regression analysis of phosphorous loading data for the maumee river, water years 2003-2005 charlie...

Post on 21-Dec-2015

224 Views

Category:

Documents

6 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Regression Analysis of Phosphorous Loading Data for

the Maumee River, Water Years 2003-2005

Charlie Piette

David Dolan

Pete Richards

Department of Natural and Applied Sciences

University of Wisconsin Green Bay

National Center for Water Quality Research, Heidelberg College

Phosphorus and the Great Lakes Water Quality Agreement

• Goal for reduction

• Initial targets

• Secondary targets

Maumee River Watershed

5

Maumee River Facts

• Size

• Contribution

Data Source• USGS

• NCWQR

• Used data from

WY 2003-2005

Purpose of Our Research

• ECOFORE 2006: Hypoxia Assessment in Lake Erie

• Estimate TP loads to Lake Erie using data from Heidelberg College and effluent data from permitted point sources

• Constructing a daily time series of phosphorus loading (Maumee River)

Problems in Constructing a Time Series for the Maumee

• Missing data

• All three years missing some data

• No major precipitation events were missed in water years 2003 and 2004

• 2005……..

Water Year 2005 Data Overview

• Missing an important time period

• December 2004-January 2005, moving the lab

• Very significant period of precipitation

• 32.8 inches of snow in January ’05

• Third wettest January on record

• Warm temps- 52˚F on New Year’s Day

Importance of WY 2005

• Fifth largest peak flow in 73 year data record- 94,100 cfs

• Orders of magnitude larger than average flows for the same time period in WY ’03 and ’04

• 3,437cfs and 10,039 cfs respectively

• Need to model the missing data to complete the time series

Objectives• Use statistical analysis to develop a model

for predicting missing T.P. for the Maumee in WY 2005

• Calculate an annual load for WY 2005 using measured and predicted data

• Compare estimated regression load to estimated load from another method

• Assess effectiveness of final regression model on other Lake Erie Tributaries

Reconstructing the Missing Concentration Data

• Multiple regression w/ SAS

• Producing an equation that can be used to model for the missing phosphorus concentrations

Basic Regression Equation• Y=ßо + ß1X1 + ß2X2 + ……… ßpXp + E• The terms…..

- 3. 0

- 2. 5

- 2. 0

- 1. 5

- 1. 0

- 0. 5

0. 0

LnFl ow

5. 0 5. 5 6. 0 6. 5 7. 0 7. 5 8. 0 8. 5 9. 0 9. 5 10. 0 10. 5 11. 0

Basic Assumption of Regression• Linear relationship between dependent

and independent variables

- 3. 5

- 3. 0

- 2. 5

- 2. 0

- 1. 5

- 1. 0

- 0. 5

0. 0

LnFl ow

4 5 6 7 8 9 10 11 12

Basic Assumptions: Continued• Normal distribution of residuals

So, the data is suitable for regression analysis. What makes for a strong model?

• Hypothesis for model significance

• Hypothesis for parameter estimate significance

• P-values- <.05

• R2 value

• M.S.E.

Beale’s Equation

Beale’s Ratio Estimator• Daily load for

sampled days• Mean daily load• Flow-adjusted mean

daily load• Bias-corrected• X 365 = annual load

estimate

Date Flow P_Concentration

10/1/2003 10644.720 0.346

10/2/2003 7858.308 .

10/3/2003 5656.312 0.300

10/4/2003 4195.272 0.239

10/5/2003 2974.260 0.226

10/6/2003 2629.872 0.207

10/7/2003 2222.868 0.181

10/8/2003 1961.968 0.174

10/9/2003 1909.788 0.163

10/10/2003 1377.552 .

10/11/2003 1116.652 .

Beale Stratified Ratio Estimator• Stratification- flow or time

• More accurate estimation

• “It’s an art!”

Beale Vs. Regression• Both a means to the same end- annual

load estimate

• Both relying on one main assumption- a linear relationship

• Big difference- Beale is not good for reconstructing a time series

Regression Analysis

Data Analysis Step 1

• Transforming the data to log space

0. 0

0. 1

0. 2

0. 3

0. 4

0. 5

0. 6

0. 7

0. 8

Fl ow

0 10000 20000 30000 40000 50000 60000 70000

- 3. 5

- 3. 0

- 2. 5

- 2. 0

- 1. 5

- 1. 0

- 0. 5

0. 0

LnFl ow

4 5 6 7 8 9 10 11 12

Regression Model 1• Log P-Conc = b0 + b1(Log Flow) + error

• Most simple model

• Historical use

Regression Model 2• Log P-Conc = b0 + b1(Log Flow) + b2(Season) + error

• Addition of second independent variable “Season”

• Dual Slope Analysis

Purpose of adding “Season”

- 3. 0

- 2. 5

- 2. 0

- 1. 5

- 1. 0

- 0. 5

0. 0

LnFl ow

5. 0 5. 5 6. 0 6. 5 7. 0 7. 5 8. 0 8. 5 9. 0 9. 5 10. 0 10. 5 11. 0

Regression Model 3• Log P-Conc = b0 + b1(Log Flow) + b2(Season) +

b3(Season Effect) + error

• Addition of “Season Effect”• Interaction variable

Purpose of adding “Season Effect”

• Interaction b/w two independent variables

• Slope adjustment

• Change in log TP concentration per unit flow during the winter season

Results of Regression Models for the Maumee, WY 2005

Selecting the Best Model for WY 2005

• Model 1 Results  Intercept Log Flow   Overall Model Mean Square

  Estimate Estimate R²  Significance Error 

  -3.1743 0.173 0.3091   0.1059

P-Value <.0001 <.0001   <.0001  

Selecting the Best Model for WY 2005

• Model 2 Results

  Intercept Log Flow Season   Overall Model Mean Square

  Estimate Estimate Estimate R² Significance Error

  -3.3331 0.2004 -0.1124 0.3218   0.1043

P-Value <.0001 <.0001 0.0167   <.0001  

Selecting the Best Model for WY 2005

• Model 3 Results

  Intercept Log Flow Season Seas. Effect  

  Estimate Estimate Estimate Estimate R² Mod. Sig MSE

  -2.2586 0.0451 -2.666 0.3297 0.4956   0.0778

P-Vals. <.0001 0.0405 <.0001 <.0001   <.0001  

Results of Regression Model 3 for the Maumee, WY 2003-2004

Model 3: Viable Option?• Looked like a good choice for WY 2005

• Ran with WY 2003-2004 data

Water Intercept Log Flow Season Season Effect   Mod.

Year Estimate Estimate Estimate Estimate R² Sig.

             

2003 -3.9067 0.2893 -0.0442 0.0482 0.6061  

P-values <.0001 <.0001 0.0462 0.0856   <.0001

             

2004 -3.511 0.2549 -1.8283 0.1745 0.6454  

P-values <.0001 <.0001 <.0001 <.0001   <.0001

             

2005  -2.2586 0.0451 -2.666 0.3297 0.4956  

P-values <.0001 0.0405 <.0001 <.0001   <.0001

Estimating an Annual TP Load Using Regression Results

Estimating an Annual Load With Regression

• Used Model 3

• Need to bring the log TP concentrations out of log-space (back-transforming)

• Back-transforming bias and estimated concentrations

Bias Correction• To make up for the low bias….

• Total Phosphorus Concentration (ppm) =

Exp[LogPredicted P Concentration + (Mean Square Error * .5)]

• Estimating annual TP load from both measured and estimated data

• Couple conversion factors……Annual Estimated Load in metric tons/year

What did We Find???

Major Purpose of Our Research• The main objective- developing a daily

time series for accurately estimating an annual load for the Maumee in 2005

How did the Regression Estimates Compare to the Beale Estimate?

• 95% Confidence IntervalsWater Regression Estimate Beale Estimate 95% Confidence

Year (Metric Ton/Year) (Metric Ton/Year) Interval

       

2003 2348.461 2341.401 2260.046 - 2422.757

2004 1905.47 1925.267 1829.385 - 2021.149

2005 2029.856 3134.59 2911.204 - 3357.975

The Discrepancy

LnP_ Concent r at i on = - 2. 2585 +0. 0451 LnFl ow - 2. 666 Season +0. 3297 Season_ eff ect

N 313

Rsq 0. 4956

Adj Rsq0. 4907

RMSE 0. 2789

- 1. 00

- 0. 75

- 0. 50

- 0. 25

0. 00

0. 25

0. 50

0. 75

1. 00

Nor mal Quant i l e

- 3 - 2 - 1 0 1 2 3

Problem with Regression

• Under-prediction

• Low-flow bias

Future Directions• Improving the regression model

• Other independent variables

• More years

Thank You

Any Questions?

top related