1 trend analysis step vs. monotonic trends; approaches to trend testing; trend tests with and...
TRANSCRIPT
1
Trend Analysis• Step vs. monotonic trends;
• approaches to trend testing;
• trend tests with and without exogeneous variables;
• dealing with seasonality;
• Introduction to time series analysis;
• Step trends
2
Testing for Trends
Purpose:
To determine if a series of observations of a random variable is generally increasing or decreasing with time
Or, has probability distribution changed with time?
Also, we may want to describe the amount or rate of change, in terms of some central value of the distribution such as the mean of median.
3
Monotonic Trend vs. Step Trend-Some Rules
Situation Monotonic StepLong record with a known event that naturally X
divides the period of record into a “pre” and
“post” period.
Record broken into two segments with a long X
gap between them.
Unbroken or nearly unbroken long record X
Multiple records with a variety of lengths and X
timing of data gaps.
Unbroken record that shows a sudden jump in X
magnitude of r.v. for no known season.
4
Approaches to Monotonic Trend Testing
• Where Y = r.v. of interest in the trend test (e.g. conc., biomass, etc.)
X = an exogenous variable expected to affect Y, (e.g. flow rate, etc.)
R = residuals from a regression or LOWESS of Y vs. X
T = time (often expressed in years)
Type Not Adjusted for X Adjusted for XNonparametric Mann-Kendall trend
test on YMann-Kendall trend test
on residuals R fromLOWESS of Y on X
Mixed - Mann-Kendall trend teston residuals R fromregression of Y on X
Parametric Regression of Y on T Regression of Y on Xand T
5
Trend tests with No Exogenous Variable
• Nonparametric Mann-Kendall test
same test as Kendall’s (discussed in the next few slides)
test is invariant to power transformation.
Kendall’s S statistic is computed from the Y, T data pairs.
H0 of no change is rejected when S (and therefore
Kendall’s of Y vs T) is significantly different from zero.
If H0 rejected, we conclude that there is a monotonic trend
in Y over time T.
6
Kendall’s Tau ()• Tau () measures the strength of the monotonic
relationship between X and Y. Tau is a rank-based procedure and is therefore resistant to the effect of a small number of unusual values.
• Because depends only on the ranks of the data and not the values themselves, it can be used even in cases where some of the data are censored.
• In general, for linear associations, < r. Strong linear correlations of r > 0.9 corresponds to > 0.7.
• Tau - easy to compute by hand, resistant to outliers, measures all monotonic correlations, and invariant to power transformations of X or Y or both.
7
Computation of Tau ()
• First order all data pairs by increasing x. If a positive correlation exists, the y’s will increase more often than decreases as x increases.
• For a negative correlation, the y’s will decrease more than increase.
• If no correlation exists, the y’s will increase and decrease about the same number of times.
• A 2-sided test for correlation will evaluate:
– Ho: no correlation exists between x and y ( = 0)
– Ha: x and y are correlated ( 0)
8
• The test statistic S measures the monotonic dependence of y on x:– S = P - M– where : P = # of (+), the # of times the y’s increase as
the x’s increase, or the # of yi < yj for all i < j.
– M = # of (-), the # of times the y’s decrease as the x’s increase, or the number of yi > yj for all i < j.
– i = 1, 2, … (n-1); and j = (i+1), …, n.
• There are n(n-1)/2 possible comparisons to be made among the n data pairs. If all y values increased along the x values, S = n(n-1)/2. In this situation, = +1, and vice versa. Therefore dividing S by n(n-1)/2 will give a -1 < < +1.
9
• Hence the definition of is:
•
• To test for the significance of , S is compared to what would be expected when the null hypothesis is true. If it is further from 0 than expected, Ho is rejected.
• For n <= 10, an exact test should be computed. The table of exact critical values is given in Table 1. For n > 10, we can use a large sample approximation for the test statistic.
2/)1(
nnS
10
11
Large sample approximation - • The large sample approximation Zs is given by:
• And, Zs = 0, if S = 0, and where:
• The null hypothesis is rejected at significance level if Zs > Zcrit where Zcrit is the critical value of the standard normal distribution with probability of exceedence of /2.
0if1 S
SZ
ss
0if1 S
SZ
ss
)52)(1)(18/( nnns
12
Example: 10 pairs of x and y are given below, ordered by increasing x:
y : 1.22 2.20 4.80 1.28 1.97 1.46 2.34 2.64 4.84 2.96
x: 2 24 99 197 377 544 3452 632 6587 53170
0
10000
20000
30000
40000
50000
60000
0 1 2 3 4 5 6y
x
Outlier
13
• To compute S, first compare y1 = 1.22 with all subsequent y’s.
• 2.20 > 1.22, hence +
• 4.40 > 1.22 hence +, etc.
• Move on to i=2, and compare y2 =2.20 to all subsequent y’s.
• 4.80 > 2.20, hence +
• 1.28 < 2.20 hence -, etc.
• For i=2, there are 5 +’s and 3 -’s. It is convenient to write all + and - below their respective yi, as shown on the next slide.
• In total there are 33 +’s (P=33) and 12 -’s (M=12). Therefore:
• S=33-12 = 21, and there are 10(9)/2=45 possible comparisons, so = 21/45 = 0.47. From Table 1, for n = 10 and S=21, the exact p-value is 2(0.036) = 0.072.
14
Table of + and - signs• yi : 1.22 2.20 4.80 1.28 1.97 1.46 2.64 2.34 4.84 2.96
• + + - + - + - + -
• + - - + + + + +
• + - - + + + +
• + - - + + +
• + + - + +
• + + + +
• + + -
• + +
• +
– 33 (+) and 12 (-), S = 33-12 = 21
15
Large sample approximation• The large sample approximation is:
• From the Table of normal distribution, the 1-sided quantile for 1.79 = 0.963, so that p=2(1-0.963) = 0.074
• The large sample approximate is quite good even for a small sample of size 10.
79.1)520)(110)(18/10(
)121(
sZ
16
Kendall-Theil Robust Line (Non-parametric)
• The K-T Robust line is related to Kendall’s correlation coefficient tau ( ) and is applicable when Y is linearly related to X.
• This line is not:– dependant on the normality of residuals for the validity of
significant tests,
– strongly affected by outliers.
• The Kendall-Theil line is of the form:
Y X 0 1
17
• This line is closely related to Kendall’s , in that the significance to the test for H0: slope is identical to the test for H0: .
• The slope estimate is computed by comparing each data pair to all others in a pairwise fashion.
• The median of all pairwise slopes is taken to be the non-parametric estimate of slope .
• The intercept is defined as follows:
1 0 0
1
1
1
median
Y Y
X X
j i
j i
for all i < j
o m ed m edY X 1
18
• Where Ymed and Xmed are the medians of X and Y. The formula assures that the fitted line goes through the point (Ymed, Xmed). This is analogous to OLS, where the fitted line always goes through the means of X and Y.
Y
X
Slopes
:
:
:
.
.
1 2 3 4
1 2 3 4
1 1 1 1
1 1 1 6
5 1 6 7
5 6 7
11 9
1
1 1 4 3 1
1 3 5 1
3 1
1
Example 1: Given the following 7 data pairs:
There are n(n-1)/2 pairs
19
Test of Significance
• The test is identical to Kendall’s . That is, first compute S, then check Table 1 if n < 10, or use large sample approximation for n > 10.
• For the example, S=20-1=19, and there are 21 pairwise slopes. =19/21=0.90. From Table 1, with n=7 and S=19, the exact 2-sided p-value is 2(0.0014)=0.003
• Note: If the Y value was 60 instead of 16, a clear outlier, the estimate of the slope would not change. This shows that the Kendall-Theil line is resistant to outliers.
20
Parametric Regression of Y on T
Simple regression of Y on T is a test for trend.
H0 is that the slope coefficient 1 = 0.
All assumptions of regression must be met - normally of
residuals, constant variance, linearity of relationship, and
independence. Need to transform Y if assumptions not met.
If H0 is rejected, we conclude that there is a linear trend in Y
over time T.
Y T 0 1
21
Comparison of Simple Tests for Trends
If regression assumptions are OK, then regression is best. Also good if there are more that one exogenous variable.
If assumptions of regression not met (outliers, censored, non-normal, etc.) Mann-Kendall will be OK or better.
Transformation of Y will affect regression, but not Mann-Kendall.
Best to try both methods.
22
23
24
Accounting for Exogenous Variables
Exogenous variable - variable other than time trend that
may have influence on Y. These variables are usually
natural, random phenomena such as rainfall, temperature
or streamflow.
Removing variation in Y caused by these variables, the
background variability or “noise” is reduced so that any
trend “signal” present is not masked. The ability of a trend
test to discern changes in Y with T is then increased.
25
Removal process involves modelling, and thus explaining the
effect of exogenous variables with regression or LOWESS.
When removing the effect of one or more exogenous variables
X, the probability distribution of the X’s is assumed to be
unchanged over the period of record.
If the probability distribution of X has changed, a trend in the
residuals may not necessarily be due to a trend in Y. Need to be
careful of what is chosen as exogenous variable.
26
Nonparametric approach - LOWESS
LOWESS - describes the relationship between Y and X without assuming linearity or normality of residuals.
LOWESS pattern should be smooth enough that it doesn’t have several local minima and maxima, but not so smooth as to eliminate the true change in slope.
LOWESS residuals:
Then, Kendall S statistic is computed from R and T pairs to test for trend.
R Y Y
27
Mixed Approach:
First do regression of Y on X (can have more than one X).
Check all regression assumption: normality, linearity,
constant variance, significant 1, etc.
Then residuals (from regression)
Then Kendall S is computed from R, T pairs to test for
trend.
R Y Y
28
Parametric approach
Uses regression of Y on T and X in one go.
This test for trend and simultaneously compensates for the
effects of exogenous variables.
Must check for assumptions of regression. If 1 is
significantly different from zero, then there is trend. 2
should be significant as well. Otherwise no point
including X.
Y T X 0 1 2
29
30
31
32
Comparison of approaches
Use LOWESS if there is nonlinearity.
No need to check assumptions closely when using
LOWESS.
No need to transform data to achieve linearity with
LOWESS.
If assumptions of regression OK, then regression is a one-
step process with maximum efficiency.
33
Dealing with Seasonality
Different seasons of the year may be a major source of
variation in the Y variable.
As with other exogenous variable, seasonal variation must
be compensated for or “removed” in order to better discern
the trend in Y over time.
May also be interested in modelling seasonality to allow
predictions of Y for different seasons.
34
Techniques for Dealing with Seasonality
Type Not Adjusted for X Adjusted for XNonparametric Seasonal Kendall test
for trend on Y(Method 1)
Seasonal Kendall trendtest on R from
LOWESS of Y on X(Method 1)
Mixed Regression ofdeseasonalized Y on T
(Method 2b)
Seasonal Kendall trendtest on R from
regression of Y on X(Method 2a)
Parametric Regression of Y on Tand seasonal terms
(Method 3)
Regression of Y on X,T, and seasonal terms
(Method 3)
35
Nonparametric method: Seasonal Kendall Test (Method 1)
Accounts for seasonality by computing Mann-Kendall test on each of m seasons separately, then combining the results.
For monthly seasons, January data are compared only with January, February only with February, etc.
S Sk ii
m
1
36
If product of number of years and number of seasons > 25, normal distribution can be used.
If |Zsk| > Zcrit then reject null hypothesis of no trend.
Zcrit = 1.96 for =0.05.
Z
S
Ssk
k
sk
k
sk
1
0
1
If Sk > 0
If Sk = 0
If Sk < 0
sk i i ii
m
n n n / 1 8 1 2 5
1
37
Estimate of trend slope
Trend slope of Y over time T = median of all slopes
between data pairs within the same season.
No cross season slopes contribute to the overall estimate of
the Seasonal Kendall trend slope.
Exogenous Variable
Use LOWESS of Y on X to get R, then apply Seasonal
Kendall on R, T.
38
Mixture Methods Method 2a
Apply seasonal Kendall test to R from a regression of Y on
X. Must check for violation or regression assumptions.
Method 2b
Deseasonalize data by subtracting seasonal medians from
all data within the season, and then regressing
deseasonalized data against T. Less power to detect trend.
39
Parametric Method (Method 3)
Multiple regression with periodic functions to describe seasonality.
Other terms = exogenous variables or dummy variables.
If 3 is significant, then there is trend.
The term 2T = 6.2832.t When t is in years.
= 0.5236.m When m is in months
= 0.0172.d When d is in days.
Y T T other term s T 0 1 2 32 2sin co s _
40
Comparison of methods
Mann-Kendall and mixed approaches applicable to
univariate data. Cannot be used for multiple Xs. Good for
nonnormal data.
Multiple regression does it all in one swoop. Fewer
parameters but constrained by functional form (sine and
cosine). Need close checking of regression assumptions.
Can provide seasonal summary statistics.
41
Presenting Seasonal Effects
Ranking Graphical Methods Tabular MethodsBest Boxplots by season, or
LOWESS of Y vs. TList the amplitude and
peak day of cycleNext Best List of seasonal medians
and seasonal IQR, or listof distribution percentage
points by seasonWorst Plot seasonal means
with standard errorbars around them
List of seasonal means,standard deviations, or
standard errors.
42
Introduction to Time Series Analysis
When the Y or R values are dependent in time (auto or
serial correlation).
Two purposed: a) Modelling and Simulation
b) Forecasting
Modelling and Simulation: ARIMA, Fourier + ARMA,
Dynamic Regression
Forecasting: ARIMA, Exponential Smoothing, Dynamic Regression
(Need a separate course to cover this topic)
Y a bY cY dX eXt t t t t 1 2 1 E.g.
43
Step TrendsStep Trends without Seasonality
Type Not Adjusted for X Adjusted for XNonparametric Rank-sum test on
YRank-sum test on Rfrom LOWESS of Y
on XMixed - Rank-sum test on R
from regression of Yon X
Parametric Two sample t-test ANCOVA of Y on Xand group
(before/after)
44
Step Trends with Seasonality
Type Not Adjusted for X Adjusted for XNonparametric Seasonal rank-sum
test on YSeasonal rank-sum teston R from LOWESS of
Y on XMixed Two-sample t-test on
deseasonalized YSeasonal rank-sum teston R from regression of
Y on XParametric ANCOVA of Y on
seasonal terms andgroup
ANCOVA of Y on X,seasonal terms and
group
45
Summary• First decide the type of trend to be analyzed
– step vs monotonic– check assumptions
• nonparametric vs parametric
• Are there exogenous variables?– Remove them first or model in one go
• Seasonality?
• Always plot the data - Boxplots, X-Y plots are most useful.