statistics
DESCRIPTION
Hydrologic StatisticTRANSCRIPT
Hydrologic Statistics
READING: CHAPTER 11 IN APPLIED HYDROLOGY
SOME SLIDES BY VENKATESH MERWADE
04/04/2006
Hydrologic Models
Deterministic (eg. Rainfall runoff analysis)◦ Analysis of hydrological processes using deterministic
approaches ◦ Hydrological parameters are based on physical relations
of the various components of the hydrologic cycle. ◦ Do not consider randomness; a given input produces the
same output.
Stochastic (eg. flood frequency analysis)◦ Probabilistic description and modeling of hydrologic
phenomena ◦ Statistical analysis of hydrologic data.
2
Classification based on randomness.
Probability A measure of how likely an event will occur
A number expressing the ratio of favorable outcome to the all possible outcomes
Probability is usually represented as P(.)◦ P (getting a club from a deck of playing cards) = 13/52 = 0.25 = 25 %◦ P (getting a 3 after rolling a dice) = 1/6
3
Random Variable Random variable: a quantity used to represent probabilistic uncertainty◦ Incremental precipitation ◦ Instantaneous streamflow◦ Wind velocity
Random variable (X) is described by a probability distribution
Probability distribution is a set of probabilities associated with the values in a random variable’s sample space
4
Sampling terminology Sample: a finite set of observations x1, x2,….., xn of the random variable
A sample comes from a hypothetical infinite population possessing constant statistical properties
Sample space: set of possible samples that can be drawn from a population
Event: subset of a sample space
6
ExampleExample Population: streamflowPopulation: streamflow Sample space: instantaneous streamflow, annual Sample space: instantaneous streamflow, annual
maximum streamflow, daily average streamflow maximum streamflow, daily average streamflow Sample: 100 observations of annual max. streamflowSample: 100 observations of annual max. streamflow Event: daily average streamflow > 100 cfsEvent: daily average streamflow > 100 cfs
Types of sampling Random sampling: the likelihood of selection of each member of the population is equal
◦ Pick any streamflow value from a population
Stratified sampling: Population is divided into groups, and then a random sampling is used
◦ Pick a streamflow value from annual maximum series.
Uniform sampling: Data are selected such that the points are uniformly far apart in time or space
◦ Pick steamflow values measured on Monday midnight
Convenience sampling: Data are collected according to the convenience of experimenter.
◦ Pick streamflow during summer7
Summary statistics Also called descriptive statistics
◦ If x1, x2, …xn is a sample then
8
n
iixn
X1
1
2
1
2
1
1
n
ii Xx
nS
2SS
X
SCV
Mean,
Variance,
Standard deviation,
Coeff. of variation,
for continuous data
for continuous data
for continuous data
Also included in summary statistics are median, skewness, correlation coefficient,
Graphical display Time Series plots
Histograms/Frequency distribution
Cumulative distribution functions
Flow duration curve
10
Time series plot Plot of variable versus time (bar/line/points) Example. Annual maximum flow series
11
0
100
200
300
400
500
600
1905 1908 1918 1927 1938 1948 1958 1968 1978 1988 1998
Year
An
nu
al M
ax F
low
(10
3 c
fs)
Colorado River near Austin
0
100
200
300
400
500
600
1900 1900 1900 1900 1900 1900 1900
Year
An
nu
al M
ax F
low
(10
3 c
fs)
Histogram Plots of bars whose height is the number ni, or fraction (ni/N), of data falling into one of several intervals of equal width
12
0
10
20
30
40
50
60
70
80
90
100
0 50 100 150 200 250 300 350 400 450 500
Annual max flow (103 cfs)
No
. of
occ
ure
nce
s Interval = 50,000 cfs
0
10
20
30
40
50
60
Annual max flow (103 cfs)
No
. of
occ
ure
nce
s
Interval = 25,000 cfs
0
5
10
15
20
25
30
0 50 100 150 200 250 300 350 400 450 500
Annual max flow (103 cfs)
No
. of
occ
ure
nce
s
Interval = 10,000 cfs
Dividing the number of occurrences with the total number of points will give Probability Mass Function
Using Excel to plot histograms
14
1) Make sure Analysis Tookpak is added in Tools.
This will add data analysis command in Tools
2) Fill one column with the data, and another with the intervals (eg. for 50 cfs interval, fill 0,50,100,…)3) Go to ToolsData AnalysisHistogram
4) Organize the plot in a presentable form (change fonts, scale, color, etc.)
Probability density function
Continuous form of probability mass function is probability density function
15
0
10
20
30
40
50
60
70
80
90
100
0 50 100 150 200 250 300 350 400 450 500
Annual max flow (103 cfs)
No
. of
occ
ure
nce
s
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 100 200 300 400 500 600
Annual max flow (103 cfs)
Pro
bab
ility
pdf is the first derivative of a cumulative distribution function
Cumulative distribution function
Cumulate the pdf to produce a cdf Cdf describes the probability that a random variable is less than or equal to specified value of x
17
0
0.2
0.4
0.6
0.8
1
0 100 200 300 400 500 600
Annual max flow (103 cfs)
Pro
bab
ility
P (Q ≤ 50000) = 0.8
P (Q ≤ 25000) = 0.4
Flow duration curve A cumulative frequency curve that shows the percentage of time that specified discharges are equaled or exceeded.
22
StepsSteps Arrange flows in chronological order Arrange flows in chronological order Find the number of records (N)Find the number of records (N) Sort the data from highest to lowest Sort the data from highest to lowest Rank the data (m=1 for the highest value and m=N for the lowest value)Rank the data (m=1 for the highest value and m=N for the lowest value) Compute exceedance probability for each value using the following Compute exceedance probability for each value using the following
formulaformula
Plot p on x axis and Q (sorted) on y axisPlot p on x axis and Q (sorted) on y axis
1100
N
mp
Flow duration curve in Excel
23
0
100
200
300
400
500
600
0 20 40 60 80 100
% of time Q will be exceeded
Q (
1000
cfs
) Median flow
Statistical analysis
Regression analysis
Mass curve analysis
Flood frequency analysis
Many more which are beyond the scope of this class!
24
Linear Regression
A technique to determine the relationship between two random variables.◦ Relationship between discharge and velocity in a stream◦ Relationship between discharge and water quality constituents
25
A regression model is given by :A regression model is given by :
yi = ith observation of the response (dependent variable)
xi = ith observation of the explanatory (independent) variable
0 = intercept
1 = slope
i = random error or residual for the ith observation
n = sample size
nixy iii ,...,2,110
Least square regression We have x1, x2, …, xn and y1,y2, …, yn observations of independent and dependent variables, respectively.
Define a linear model for yi,
Fit the model (find b0 and b1) such at the sum of the squares of the vertical deviations is minimum
◦ Minimize
26
nixy ii ,...,2,1ˆ 10
nixyyy iiii ,...,2,1)(ˆ 210
2
Regression applet: http://www.math.csusb.edu/faculty/stanton/m262/regress/regress.html
Linear Regression in Excel Steps:
◦ Prepare a scatter plot◦ Fit a trend line
27
TDS = 0.5946(sp. Cond) - 15.709R2 = 0.9903
0
300
600
900
1200
1500
1800
0 500 1000 1500 2000 2500 3000
Specific Conductance ( S/cm)
TD
S (
mg
/L)
Alternatively, one can use ToolsAlternatively, one can use ToolsData AnalysisData AnalysisRegressionRegression
Data are for Brazos River near Highbank, TX
Coefficient of determination (R2)
It is the proportion of observed y variation that can be explained by the simple linear regression model
28
SST
SSER 12
2)( yySST i Total sum of squares, Ybar is the mean of yi
2)ˆ( ii yySSE Error sum of squares
The higher the value of RThe higher the value of R22, the more successful is the model in explaining y , the more successful is the model in explaining y variation.variation.
If RIf R22 is small, search for an alternative model (non linear or multiple regression is small, search for an alternative model (non linear or multiple regression model) that can more effectively explain y variationmodel) that can more effectively explain y variation