application of statistical techniques to interpretation of water monitoring data

48
Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Upload: nydia

Post on 22-Feb-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Application of Statistical Techniques to Interpretation of Water Monitoring Data. Eric Smith, Golde Holtzman, and Carl Zipper. Outline. I. Water quality data: program design (CEZ, 15 min) II. Characteristics of water-quality data (CEZ, 15 min) III. Describing water quality(GIH, 30 min) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Application of Statistical Techniques to Interpretation

of Water Monitoring Data

Eric Smith, Golde Holtzman, and Carl Zipper

Page 2: Application of Statistical Techniques to Interpretation of Water Monitoring Data

OutlineI. Water quality data: program design (CEZ, 15 min)

II. Characteristics of water-quality data (CEZ, 15 min)

III. Describing water quality(GIH, 30 min)IV. Data analysis for making decisions

A, Compliance with numerical standards (EPS, 45 min)

Dinner Break

B, Locational / temporal comparisons (“cause and effect”) (EPS, 45)

C, Detection of water-quality trends (GIH, 60 min)

Page 3: Application of Statistical Techniques to Interpretation of Water Monitoring Data

III. Describing water quality(GIH, 30 min)

• Rivers and streams are an essential component of the biosphere

• Rivers are alive• Life is characterized by variation• Statistics is the science of variation• Statistical Thinking/Statistical Perspective • Thinking in terms of variation• Thinking in terms of distribution

Page 4: Application of Statistical Techniques to Interpretation of Water Monitoring Data

The present problem is multivariate

• WATER QUALITY as a function of • TIME, under the influence of co-variates like• FLOW, at multiple • LOCATIONS

Page 5: Application of Statistical Techniques to Interpretation of Water Monitoring Data

WQ variable versus time

Time in Years

Wat

er V

aria

ble

Page 6: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Bear Creek below Town of Wise STP

6.5

7

7.5

8

8.5

9

PH

1973/12/14 1978/12/14 1983/12/14 1988/12/14 1993/12/14

DATE

Page 7: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Univariate WQ Variable

Time

Wat

er Q

ualit

y

Page 8: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Univariate WQ Variable

Time

Wat

er Q

ualit

yW

ater

Qua

lity

Water Quality

Wat

er Q

ualit

y

Water Quality

Wat

er Q

ualit

yW

ater

Qua

lity

Wat

er Q

ualit

yW

ater

Qua

lity

Wat

er Q

ualit

yW

ater

Qua

lity

Wat

er Q

ualit

y

Page 9: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Univariate Perspective, Real Data (pH below STP)

6.5 7 7.5 8 8.5 9

6.5

7

7.5

8

8.5

9

Page 10: Application of Statistical Techniques to Interpretation of Water Monitoring Data

The three most important pieces of information in a sample:

• Central Location– Mean, Median, Mode

• Dispersion– Range, Standard Deviation,

Inter Quartile Range• Shape

– Symmetry, skewness, kurtosis– No mode, unimodal, bimodal, multimodal

Page 11: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Central Location: Sample Mean

• (Sum of all observations) / (sample size)• Center of gravity of the distribution• depends on each observation• therefore sensitive to outliers

Page 12: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Central Location: Sample Mean

• (Sum of all observations) / (sample size)• Center of gravity of the distribution• depends on each observation• therefore sensitive to outliers

Page 13: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Central Location: Sample Mean

• (Sum of all observations) / (sample size)• Center of gravity of the distribution• depends on each observation• therefore sensitive to outliers

Page 14: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Central Location: Sample Mean

• (Sum of all observations) / (sample size)• Center of gravity of the distribution• depends on each observation• therefore sensitive to outliers

Page 15: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Central Location: Sample Mean

• (Sum of all observations) / (sample size)• Center of gravity of the distribution• depends on each observation• therefore sensitive to outliers

Page 16: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Central Location: Sample Mean

• (Sum of all observations) / (sample size)• Center of gravity of the distribution• depends on each observation• therefore sensitive to outliers

Page 17: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Central Location: Sample Median• Center of the ordered array• I.e., the (0.5)(n + 1) observation in the ordered array.

If sample size n is odd, then the

median is the middle value in the

ordered array.

Example A:

1, 1, 0, 2 , 3

Order:

0, 1, 1, 2, 3

n = 5, odd

(0.5)(n + 1) = 3

Median = 1

If sample size n is even, then the

median is the average of the two

middle values in the ordered array.

Example B:

1, 1, 0, 2, 3, 6

Order:

0, 1, 1, 2, 3, 6

n = 6, even,

(0.5)(n + 1) = 3.5

Median = (1 + 2)/2 = 1.5

Page 18: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Central Location: Sample Median

• Center of the ordered array• depends on the magnitude of the central

observations only• therefore NOT sensitive to outliers

Page 19: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Central Location: Sample Median

• Center of the ordered array• depends on the magnitude of the central

observations only• therefore NOT sensitive to outliers

Page 20: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Central Location: Sample Median

• Center of the ordered array• depends on the magnitude of the central

observations only• therefore NOT sensitive to outliers

Page 21: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Central Location: Sample Median

• Center of the ordered array• depends on the magnitude of the central

observations only• therefore NOT sensitive to outliers

Page 22: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Central Location: Sample Median

• Center of the ordered array• depends on the magnitude of the central

observations only• therefore NOT sensitive to outliers

Page 23: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Central Location: Sample Median

• Center of the ordered array• depends on the magnitude of the central

observations only• therefore NOT sensitive to outliers

Page 24: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Central Location: Mean vs. Median

• Mean is influenced by outliers• Median is robust against (resistant to) outliers• Mean “moves” toward outliers• Median represents bulk of observations almost

always

Comparison of mean and median tells us about outliers

Page 25: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Dispersion

• Range• Standard Deviation• Inter-quartile Range

Page 26: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Dispersion: Range• Maximum - Minimum• Easy to calculate• Easy to interpret• Depends on sample size (biased)• Therefore not good for statistical

inference

Page 27: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Dispersion: Standard Deviation

1

2

nYY-

0 5

-1+1

SD = 10

0 5

-2+2

SD = 2

1 2

-1 1 3

Page 28: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Dispersion: Properties of SD• SD > 0 for all data• SD = 0 if and only if all observations the same

(no variation)• Familiar Intervals for a normal distribution,

– 68% expected within 1 SD,– 95% expected within 2 SD,– 99.6% expected within 3 SD,– Exact for normal distribution, ballpark for any distn

• For any distribution, nearly all observations lie within 3 SD

Page 29: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Interpretation of SD

6.5 7 7.5 8 8.5 9

n = 200

SD = 0.41

Median = 7.6

Mean = 7.6

Page 30: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Quartiles, Percentiles, Quantiles, Five Number Summary, Boxplot

Maximum 4th quartile 100th percentile 1.00 quantile

3rd quartile 75th percentile 0.75 quantile

Median 2nd quartile 50th percentile 0.50 quantile

1st quartile 25th percentile 0.25 quantile

Minimum 0th quartile 0th percentile 0.00 quantile

Page 31: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Quartiles (undergrad classes) E.g., Sample: 0, −3.1, −0.4, 0, 2.2, 5.1, 3.8, 3.8, 3.9, 2.3, n = 10

Rank Value

10 5.1 Maximum

9 3.9

8 3.8 3rd Quartile

7 3.8

6 2.3Median 2nd Quartile

5 2.2

4 0

3 0 1st Quartile

2 −0.4

1 −3.1 Minimum

3 3.8Q

22.2 2.3 2.25

2Q

1 0Q

Max 5.1

Min 3.1

Note: Quartiles Q0, Q1, Q2, Q3, Q4, = Quantiles Q0.00, Q0.25, Q0.50, Q0.75, Q1.00

Page 32: Application of Statistical Techniques to Interpretation of Water Monitoring Data

5-Number Summary and Boxplot (undergrad perspective)

Min Q1 Q2 Q3 Max

−3.10 0.00 2.25 3.80 5.10

2 2.25Median Q

5.10 3.10 8.20Range Max Min

3 1 3.80 0.00 3.80IQR Q Q

Page 33: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Terminology Warning:

Quartiles, a.k.a. Percentiles, a.k.a. Quantiles

Note: Quartiles Q0, Q1, Q2, Q3, Q4, = Quantiles Q0.00, Q0.25, Q0.50, Q0.75, Q1.00

Quartiles Percentiles QuantilesQ4 = 4th quartile = Max = 100th percentile = Q1.00 = 1.00 quantile

Q3 = 3rd quartile = 75th percentile = Q0.75 = 0.75 quantile

Q2 = 2nd quartile = Med = 50th percentile = Q0.50 = 0.50 quantile

Q1 = 1st quartile = 25th percentile = Q0.25 = 0.25 quantile

Q0 = 0th quartile = Min = 0th percentile = Q0.00 = 0.00 quantile

Page 34: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Terminology Warning:

But Percentiles and Quantiles are more general

Note: Quartiles Q0, Q1, Q2, Q3, Q4, = Quantiles Q0.00, Q0.25, Q0.50, Q0.75, Q1.00

Quartiles Percentiles QuantilesQ4 = 4th quartile = Max = 100th percentile = Q1.00 = 1.00 quantile

95th percentile = Q0.95 = 0.95 quantile

Q3 = 3rd quartile = 75th percentile = Q0.75 = 0.75 quantile

60th percentile = Q0.60 = 0.60 quantile

Q2 = 2nd quartile = Med = 50th percentile = Q0.50 = 0.50 quantile

34th percentile = Q0.34 = 0.34 quantile

Q1 = 1st quartile = 25th percentile = Q0.25 = 0.25 quantile

2.5th percentile = Q0.025 = 0.025 quantileQ0 = 0th quartile = Min = 0th percentile = Q0.00 = 0.00 quantile

Page 35: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Quantile Location and Quantilesby weighted averages (graduate classes)

1: Quantile Location 1

2 :

th

thq

Step q L q n

Step q Quantile Q a w b a

Example: Find the 20th percentile of the sample above.Step 1:

q = 0.20, n =10

L = 0.20(10 + 1) = 2.2

indicating the “2.2th “ observation in the ordered array.

Step 2: Therefore the 0.20 quantile is a weighted average of the 2nd and 3rd

observations in the ordered array, which are

a = − 0.4, b = 0

and the weight is

w = 0.2

Q = -0.4 + 0.2(0 – (– 0.4)) = – 0.40 + 0.08= – 0.32

E.g., Sample: 0, −3.1, −0.4, 0, 2.2, 5.1, 3.8, 3.8, 3.9, 2.3, n = 10

Page 36: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Quantile Location and Quantilesby weighted averages (graduate classes)

1: Quantile Location 1

2 :

th

thq

Step q L q n

Step q Quantile Q a w b a

Step 2:

a = − 0.4, b = 0, w = 0.2

Q = a + w(b – a)

= – 0.4 + 0.2(0 – (– 0.4))

= – 0.4 + 0.2(0.4)

= – 0.40 + 0.08

= – 0.32

E.g., Sample: 0, −3.1, −0.4, 0, 2.2, 5.1, 3.8, 3.8, 3.9, 2.3, n = 10

– 0.4 0

0.4

– 0.32

Page 37: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Quantile Location and Quantiles Example: 0, − 3.1, − 0.4, 0, 2.2, 5.1, 3.8, 3.8, 3.9, 2.3, n = 10

Value Rank

5.1 10

3.9 9

3.8 8

3.8 7

2.3 6

2.2 5

0 4

0 3

−0.4 2

−3.1 1

Quantilerank, q

Quantile Location, L Quantile, Q

Common Name

1.00 n = 10 5.1 Maximum

0.75 0.75(10+1) = 8.25

3.8+0.25(3.9 − 3.8)= 3.825 3rd Quartile

0.50 0.5(10+1) = 5.5

2.2+0.5(2.3 − 2.2)= 2.25

Median, or 2nd Quartile

0.25 0.25(10+1)=2.75

−0.4+0.75[0 − (−0.4)]= −0.1 1st Quartile

0.00 1 −3.1 Minimum

Page 38: Application of Statistical Techniques to Interpretation of Water Monitoring Data

5-Number Summary and Boxplot using weighted averages for quantiles

Min Q1 Q2 Q3 Max

−3.10 −0.10 2.25 3.825 5.10

2 2.25Median Q

5.10 3.10 8.20Range Max Min

3 1 3.825 0.10 3.925IQR Q Q

Note slightly different results by using weighted averages.

Page 39: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Dispersion: IQRInter-Quartile Range

• (3rd Quartile - (1st Quartile)• Robust against outliers

Page 40: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Interpretation of IQR

6.5 7 7.5 8 8.5 9

n = 200

SD = 0.41

Median = 7.6

Mean = 7.6

IQR = 0.54

For a Normal distribution, Median 2IQR includes 99.3%

Page 41: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Shape: Symmetry and Skewness• Symmetry mean

bilateral symmetry

Page 42: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Shape: Symmetry and Skewness• Symmetry mean

bilateral symmetry

• Positive Skewness (asymmetric “tail” in positive direction)

Page 43: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Shape: Symmetry and Skewness• “Symmetry” mean bilateral

symmetry, skewness = 0• Mean = Median (approximately)

• Positive Skewness (asymmetric “tail” in positive direction)

• Mean > Median

• Negative Skewness (asymmetric “tail” in negative direction)

• Mean < Median

Comparison of mean and median tells us about shape

Page 44: Application of Statistical Techniques to Interpretation of Water Monitoring Data

6.5 7 7.5 8 8.5 9

6.5

7

7.5

8

8.5

9

Bear Creek below Town of Wise STP

Page 45: Application of Statistical Techniques to Interpretation of Water Monitoring Data

6.5

7

7.5

8

8.5

9

Outlier Box Plot Outliers

Whisker

Whisker

Median

75th %-tile = 3rd Quartile

25th %-tile = 1st Quartile

IQR

Page 46: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Wise, VA, below STP

6.5

7

7.5

8

8.5

9

0

2

4

6

8

1011

13

pH

TKN

mg/

l

Page 47: Application of Statistical Techniques to Interpretation of Water Monitoring Data

Wise, VA below STP

102030405060708090

100110120130

0

5

10

15

20

25

DO

(% s

atur

)

BO

D (

mg/

l)

Page 48: Application of Statistical Techniques to Interpretation of Water Monitoring Data

0

1

2

3

4

5

Wise, VA below STPTo

t Pho

spho

rous

(mg/

l

0

10000

20000

30000

40000

50000

60000Fecal Coliforms