1 trend analysis step vs. monotonic trends; approaches to trend testing; trend tests with and...

1

Trend Analysis• Step vs. monotonic trends;

• approaches to trend testing;

• trend tests with and without exogeneous variables;

• dealing with seasonality;

• Introduction to time series analysis;

• Step trends

2

Testing for Trends

Purpose:

To determine if a series of observations of a random variable is generally increasing or decreasing with time

Or, has probability distribution changed with time?

Also, we may want to describe the amount or rate of change, in terms of some central value of the distribution such as the mean of median.

3

Monotonic Trend vs. Step Trend-Some Rules

Situation Monotonic StepLong record with a known event that naturally X

divides the period of record into a “pre” and

“post” period.

Record broken into two segments with a long X

gap between them.

Unbroken or nearly unbroken long record X

Multiple records with a variety of lengths and X

timing of data gaps.

Unbroken record that shows a sudden jump in X

magnitude of r.v. for no known season.

4

Approaches to Monotonic Trend Testing

• Where Y = r.v. of interest in the trend test (e.g. conc., biomass, etc.)

X = an exogenous variable expected to affect Y, (e.g. flow rate, etc.)

R = residuals from a regression or LOWESS of Y vs. X

T = time (often expressed in years)

Type Not Adjusted for X Adjusted for XNonparametric Mann-Kendall trend

test on YMann-Kendall trend test

on residuals R fromLOWESS of Y on X

Mixed - Mann-Kendall trend teston residuals R fromregression of Y on X

Parametric Regression of Y on T Regression of Y on Xand T

5

Trend tests with No Exogenous Variable

• Nonparametric Mann-Kendall test

same test as Kendall’s (discussed in the next few slides)

test is invariant to power transformation.

Kendall’s S statistic is computed from the Y, T data pairs.

H0 of no change is rejected when S (and therefore

Kendall’s of Y vs T) is significantly different from zero.

If H0 rejected, we conclude that there is a monotonic trend

in Y over time T.

6

Kendall’s Tau ()• Tau () measures the strength of the monotonic

relationship between X and Y. Tau is a rank-based procedure and is therefore resistant to the effect of a small number of unusual values.

• Because depends only on the ranks of the data and not the values themselves, it can be used even in cases where some of the data are censored.

• In general, for linear associations, < r. Strong linear correlations of r > 0.9 corresponds to > 0.7.

• Tau - easy to compute by hand, resistant to outliers, measures all monotonic correlations, and invariant to power transformations of X or Y or both.

7

Computation of Tau ()

• First order all data pairs by increasing x. If a positive correlation exists, the y’s will increase more often than decreases as x increases.

• For a negative correlation, the y’s will decrease more than increase.

• If no correlation exists, the y’s will increase and decrease about the same number of times.

• A 2-sided test for correlation will evaluate:

– Ho: no correlation exists between x and y ( = 0)

– Ha: x and y are correlated ( 0)

8

• The test statistic S measures the monotonic dependence of y on x:– S = P - M– where : P = # of (+), the # of times the y’s increase as

the x’s increase, or the # of yi < yj for all i < j.

– M = # of (-), the # of times the y’s decrease as the x’s increase, or the number of yi > yj for all i < j.

– i = 1, 2, … (n-1); and j = (i+1), …, n.

• There are n(n-1)/2 possible comparisons to be made among the n data pairs. If all y values increased along the x values, S = n(n-1)/2. In this situation, = +1, and vice versa. Therefore dividing S by n(n-1)/2 will give a -1 < < +1.

9

• Hence the definition of is:

•

• To test for the significance of , S is compared to what would be expected when the null hypothesis is true. If it is further from 0 than expected, Ho is rejected.

• For n <= 10, an exact test should be computed. The table of exact critical values is given in Table 1. For n > 10, we can use a large sample approximation for the test statistic.

2/)1(

nnS

11

Large sample approximation - • The large sample approximation Zs is given by:

• And, Zs = 0, if S = 0, and where:

• The null hypothesis is rejected at significance level if Zs > Zcrit where Zcrit is the critical value of the standard normal distribution with probability of exceedence of /2.

0if1 S

SZ

ss

0if1 S

SZ

ss

)52)(1)(18/( nnns

12

Example: 10 pairs of x and y are given below, ordered by increasing x:

y : 1.22 2.20 4.80 1.28 1.97 1.46 2.34 2.64 4.84 2.96

x: 2 24 99 197 377 544 3452 632 6587 53170

0

10000

20000

30000

40000

50000

60000

0 1 2 3 4 5 6y

x

Outlier

13

• To compute S, first compare y1 = 1.22 with all subsequent y’s.

• 2.20 > 1.22, hence +

• 4.40 > 1.22 hence +, etc.

• Move on to i=2, and compare y2 =2.20 to all subsequent y’s.

• 4.80 > 2.20, hence +

• 1.28 < 2.20 hence -, etc.

• For i=2, there are 5 +’s and 3 -’s. It is convenient to write all + and - below their respective yi, as shown on the next slide.

• In total there are 33 +’s (P=33) and 12 -’s (M=12). Therefore:

• S=33-12 = 21, and there are 10(9)/2=45 possible comparisons, so = 21/45 = 0.47. From Table 1, for n = 10 and S=21, the exact p-value is 2(0.036) = 0.072.

14

Table of + and - signs• yi : 1.22 2.20 4.80 1.28 1.97 1.46 2.64 2.34 4.84 2.96

• + + - + - + - + -

• + - - + + + + +

• + - - + + + +

• + - - + + +

• + + - + +

• + + + +

• + + -

• + +

• +

– 33 (+) and 12 (-), S = 33-12 = 21

15

Large sample approximation• The large sample approximation is:

• From the Table of normal distribution, the 1-sided quantile for 1.79 = 0.963, so that p=2(1-0.963) = 0.074

• The large sample approximate is quite good even for a small sample of size 10.

79.1)520)(110)(18/10(

)121(

sZ

16

Kendall-Theil Robust Line (Non-parametric)

• The K-T Robust line is related to Kendall’s correlation coefficient tau ( ) and is applicable when Y is linearly related to X.

• This line is not:– dependant on the normality of residuals for the validity of

significant tests,

– strongly affected by outliers.

• The Kendall-Theil line is of the form:

Y X 0 1

17

• This line is closely related to Kendall’s , in that the significance to the test for H0: slope is identical to the test for H0: .

• The slope estimate is computed by comparing each data pair to all others in a pairwise fashion.

• The median of all pairwise slopes is taken to be the non-parametric estimate of slope .

• The intercept is defined as follows:

1 0 0

1

1

1

median

Y Y

X X

j i

j i

for all i < j

o m ed m edY X 1

18

• Where Ymed and Xmed are the medians of X and Y. The formula assures that the fitted line goes through the point (Ymed, Xmed). This is analogous to OLS, where the fitted line always goes through the means of X and Y.

Y

X

Slopes

:

:

:

.

.

1 2 3 4

1 2 3 4

1 1 1 1

1 1 1 6

5 1 6 7

5 6 7

11 9

1

1 1 4 3 1

1 3 5 1

3 1

1

Example 1: Given the following 7 data pairs:

There are n(n-1)/2 pairs

19

Test of Significance

• The test is identical to Kendall’s . That is, first compute S, then check Table 1 if n < 10, or use large sample approximation for n > 10.

• For the example, S=20-1=19, and there are 21 pairwise slopes. =19/21=0.90. From Table 1, with n=7 and S=19, the exact 2-sided p-value is 2(0.0014)=0.003

• Note: If the Y value was 60 instead of 16, a clear outlier, the estimate of the slope would not change. This shows that the Kendall-Theil line is resistant to outliers.

20

Parametric Regression of Y on T

Simple regression of Y on T is a test for trend.

H0 is that the slope coefficient 1 = 0.

All assumptions of regression must be met - normally of

residuals, constant variance, linearity of relationship, and

independence. Need to transform Y if assumptions not met.

If H0 is rejected, we conclude that there is a linear trend in Y

over time T.

Y T 0 1

21

Comparison of Simple Tests for Trends

If regression assumptions are OK, then regression is best. Also good if there are more that one exogenous variable.

If assumptions of regression not met (outliers, censored, non-normal, etc.) Mann-Kendall will be OK or better.

Transformation of Y will affect regression, but not Mann-Kendall.

Best to try both methods.

24

Accounting for Exogenous Variables

Exogenous variable - variable other than time trend that

may have influence on Y. These variables are usually

natural, random phenomena such as rainfall, temperature

or streamflow.

Removing variation in Y caused by these variables, the

background variability or “noise” is reduced so that any

trend “signal” present is not masked. The ability of a trend

test to discern changes in Y with T is then increased.

25

Removal process involves modelling, and thus explaining the

effect of exogenous variables with regression or LOWESS.

When removing the effect of one or more exogenous variables

X, the probability distribution of the X’s is assumed to be

unchanged over the period of record.

If the probability distribution of X has changed, a trend in the

residuals may not necessarily be due to a trend in Y. Need to be

careful of what is chosen as exogenous variable.

26

Nonparametric approach - LOWESS

LOWESS - describes the relationship between Y and X without assuming linearity or normality of residuals.

LOWESS pattern should be smooth enough that it doesn’t have several local minima and maxima, but not so smooth as to eliminate the true change in slope.

LOWESS residuals:

Then, Kendall S statistic is computed from R and T pairs to test for trend.

R Y Y

27

Mixed Approach:

First do regression of Y on X (can have more than one X).

Check all regression assumption: normality, linearity,

constant variance, significant 1, etc.

Then residuals (from regression)

Then Kendall S is computed from R, T pairs to test for

trend.

R Y Y

28

Parametric approach

Uses regression of Y on T and X in one go.

This test for trend and simultaneously compensates for the

effects of exogenous variables.

Must check for assumptions of regression. If 1 is

significantly different from zero, then there is trend. 2

should be significant as well. Otherwise no point

including X.

Y T X 0 1 2

32

Comparison of approaches

Use LOWESS if there is nonlinearity.

No need to check assumptions closely when using

LOWESS.

No need to transform data to achieve linearity with

LOWESS.

If assumptions of regression OK, then regression is a one-

step process with maximum efficiency.

33

Dealing with Seasonality

Different seasons of the year may be a major source of

variation in the Y variable.

As with other exogenous variable, seasonal variation must

be compensated for or “removed” in order to better discern

the trend in Y over time.

May also be interested in modelling seasonality to allow

predictions of Y for different seasons.

34

Techniques for Dealing with Seasonality

Type Not Adjusted for X Adjusted for XNonparametric Seasonal Kendall test

for trend on Y(Method 1)

Seasonal Kendall trendtest on R from

LOWESS of Y on X(Method 1)

Mixed Regression ofdeseasonalized Y on T

(Method 2b)

Seasonal Kendall trendtest on R from

regression of Y on X(Method 2a)

Parametric Regression of Y on Tand seasonal terms

(Method 3)

Regression of Y on X,T, and seasonal terms

(Method 3)

35

Nonparametric method: Seasonal Kendall Test (Method 1)

Accounts for seasonality by computing Mann-Kendall test on each of m seasons separately, then combining the results.

For monthly seasons, January data are compared only with January, February only with February, etc.

S Sk ii

m

1

36

If product of number of years and number of seasons > 25, normal distribution can be used.

If |Zsk| > Zcrit then reject null hypothesis of no trend.

Zcrit = 1.96 for =0.05.

Z

S

Ssk

k

sk

k

sk

1

0

1

If Sk > 0

If Sk = 0

If Sk < 0

sk i i ii

m

n n n / 1 8 1 2 5

1

37

Estimate of trend slope

Trend slope of Y over time T = median of all slopes

between data pairs within the same season.

No cross season slopes contribute to the overall estimate of

the Seasonal Kendall trend slope.

Exogenous Variable

Use LOWESS of Y on X to get R, then apply Seasonal

Kendall on R, T.

38

Mixture Methods Method 2a

Apply seasonal Kendall test to R from a regression of Y on

X. Must check for violation or regression assumptions.

Method 2b

Deseasonalize data by subtracting seasonal medians from

all data within the season, and then regressing

deseasonalized data against T. Less power to detect trend.

39

Parametric Method (Method 3)

Multiple regression with periodic functions to describe seasonality.

Other terms = exogenous variables or dummy variables.

If 3 is significant, then there is trend.

The term 2T = 6.2832.t When t is in years.

= 0.5236.m When m is in months

= 0.0172.d When d is in days.

Y T T other term s T 0 1 2 32 2sin co s _

40

Comparison of methods

Mann-Kendall and mixed approaches applicable to

univariate data. Cannot be used for multiple Xs. Good for

nonnormal data.

Multiple regression does it all in one swoop. Fewer

parameters but constrained by functional form (sine and

cosine). Need close checking of regression assumptions.

Can provide seasonal summary statistics.

41

Presenting Seasonal Effects

Ranking Graphical Methods Tabular MethodsBest Boxplots by season, or

LOWESS of Y vs. TList the amplitude and

peak day of cycleNext Best List of seasonal medians

and seasonal IQR, or listof distribution percentage

points by seasonWorst Plot seasonal means

with standard errorbars around them

List of seasonal means,standard deviations, or

standard errors.

42

Introduction to Time Series Analysis

When the Y or R values are dependent in time (auto or

serial correlation).

Two purposed: a) Modelling and Simulation

b) Forecasting

Modelling and Simulation: ARIMA, Fourier + ARMA,

Dynamic Regression

Forecasting: ARIMA, Exponential Smoothing, Dynamic Regression

(Need a separate course to cover this topic)

Y a bY cY dX eXt t t t t 1 2 1 E.g.

43

Step TrendsStep Trends without Seasonality

Type Not Adjusted for X Adjusted for XNonparametric Rank-sum test on

YRank-sum test on Rfrom LOWESS of Y

on XMixed - Rank-sum test on R

from regression of Yon X

Parametric Two sample t-test ANCOVA of Y on Xand group

(before/after)

44

Step Trends with Seasonality

Type Not Adjusted for X Adjusted for XNonparametric Seasonal rank-sum

test on YSeasonal rank-sum teston R from LOWESS of

Y on XMixed Two-sample t-test on

deseasonalized YSeasonal rank-sum teston R from regression of

Y on XParametric ANCOVA of Y on

seasonal terms andgroup

ANCOVA of Y on X,seasonal terms and

group

45

Summary• First decide the type of trend to be analyzed

– step vs monotonic– check assumptions

• nonparametric vs parametric

• Are there exogenous variables?– Remove them first or model in one go

• Seasonality?

• Always plot the data - Boxplots, X-Y plots are most useful.

1 trend analysis step vs. monotonic trends; approaches to trend testing; trend tests with and...

Documents