debiasing forecasts: how useful is the unbiasedness test?

9

Click here to load reader

Upload: paul-goodwin

Post on 16-Sep-2016

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Debiasing forecasts: how useful is the unbiasedness test?

International Journal of Forecasting 19 (2003) 467–475www.elsevier.com/ locate/ ijforecast

qD ebiasing forecasts: how useful is the unbiasedness test?

a , b*Paul Goodwin , Richard LawtonaThe Management School, University of Bath, Bath BA2 7AY, UK

bFaculty of Computing, Engineering and Mathematical Sciences, University of the West of England, Frenchay, Bristol BS16 1QY, UK

Abstract

A number of studies have demonstrated the improvements in accuracy that can result from correcting judgmental forecasts to removesystematic bias. It has been suggested that the ‘unbiasedness test’, based on the F-distribution, should be employed to determine when toapply correction to forecasts. The effectiveness of using the test for this purpose was investigated under conditions where its underlyingassumptions were valid. The results suggest that, even under these conditions, the use of the test is unlikely to be advisable in most practicalcontexts and that a policy of always correcting forecasts is preferable. 2002 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.

Keywords: Forecast correction; Bias; Judgmental forecasting; Significance testing

1 . Introduction a 5a 1bp 1 e (1)t t t

wherea is the actual outcome at periodt, p is thet tForecasts often suffer from systematic biases.forecast for periodt, e is error att anda andb aretThese may result from deficiencies in the quantita-the population regression coefficients. The estimatetive model that is used to make the forecasts or, inof the regression model in (1) can then be used tothe case of judgmental forecasting, they may resultprovide an estimate of the corrected forecast for timefrom psychological or political factors (Goodwin & ˆt, f , that will be free of the systematic bias so that:tWright, 1993; Fildes & Hastings, 1994). If a sys-

tematic bias is associated with a set of forecasts (and ˆ ˆˆf 5a 1bp (2)t tif it is expected to continue in the future), an obvious

ˆˆwhere a and b are OLS estimators ofa and b,strategy is to estimate the bias and then correctrespectivelysubsequent forecasts in order to remove it. A method

Theil (1971) used a decomposition of the meanof correction suggested by Theil (1971) involvessquared error (MSE) to show how this correctionregressing the actual outcomes on to the forecasts toremoves systematic bias from past forecasts (i.e.obtain a model of the form:forecasts for which the outcomes have been realised)(see also Moriarty, 1985). Assuming that biases

qAn earlier version of this paper was presented at the 20th remain constant, the correction can be expected toInternational Symposium on Forecasting, Lisbon, June 2000.

improve forecasts beyond the periods used to fit the*Corresponding author. Tel.:144-122-532-3594; fax:144-regression model and a number of successful appli-122-582-6473.

E-mail address: [email protected](P. Goodwin). cations have been reported. For example, Shaffer

0169-2070/02/$ – see front matter 2002 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.doi:10.1016/S0169-2070(02)00059-6

Page 2: Debiasing forecasts: how useful is the unbiasedness test?

468 P. Goodwin, R. Lawton / International Journal of Forecasting 19 (2003) 467–475

(1998) found that correction of commercial forecasts determine whether it suggested that forecasts shouldof the US implicit GNP price deflator reduced the be corrected. Ahlburg (1984) applied separate tests

ˆˆMSE of out-of-sample forecasts by up to 25%. toa andb, while Elgers et al. (1995) only tested theˆSimilarly, Ahlburg (1984) found that the correction significance ofb.

substantially improved forecasts of US prices and One interesting result of some of these studies ishousing starts, while Elgers, May and Murray (1995) that the unbiasedness test applied to past forecastsapplied it to analysts’ company earnings forecasts (and in Ahlburg’s case the use of separatet tests on

ˆˆand found that it reduced the MSE’s emanating from a and b ) was a poor guide to whether forecastsystematic bias by about 91%. In an analysis of two correction would improve the accuracy of subsequentUK-based manufacturing companies, Goodwin forecasts. For example, Goodwin (2000) found no(2000) found that the median absolute percentage association between the significance, or otherwise, oferror of managers’ forecasts was reduced, for out-of- the unbiasedness test (employed at the 5% signifi-sample periods, by an average of 12% (of its original cance level) and the success of subsequent forecastvalue) for the first company and 23% for the second. correction. Similar results were reported in Goodwin

Despite these reported improvements in accuracy (1997). Shaffer (1998) states that ‘‘improvementsthe application of (2) is, of course, not guaranteed to were noted. . . even though unbiasedness of the rawimprove the accuracy of future forecasts, even if the forecasts . . . was only marginally rejected [i.e. at theeffect of noise on the outcomes is discounted. 10% significance level]’’, while Ahlburg reports that

ˆChanges in the nature of the outcome–forecast ‘‘although the mean correction factor [a ] was notrelationship over time may mean that corrections statistically significant at traditional levels, using

ˆˆbased on biases which no longer apply are counter- both correction factors [a and b ] led to moreproductive (Goodwin, 1997), while the success of accurate forecasts’’. Of course, in some of thesecorrection may be constrained by other simplifica- studies the assumptions required by the unbiasednesstions in the model, such as the assumption of a linear test (and associated tests) may not have been met byrelationship between the outcomes and forecasts. the data. For example, the error distributions maySimilarly, the model cannot take into account pos- have departed from normality or lack of stationaritysible lags in this relationship (Shaffer, 1998) or in the outcomes or forecasts may have led tobiases in the use of information about predictor autocorrelation in these errors. Alternatively, it mayvariables (Fildes, 1991). be that the use of conventional levels of significance,

Even if none of these problems apply, sampling like 5%, makes the test too conservative and weightserrors in the estimation ofa andb may still lead to it too heavily against indicating that forecast correc-inappropriate and damaging corrections, particularly tion is desirable.where the forecasts are really unbiased. This raises This paper examines the effectiveness of thean important practical question—given a set of past unbiasedness test as a guide to forecast correctionforecasts and outcomes, how can we determine when it is applied under conditions thatdo meet itswhether correction of subsequent forecasts is appro- underlying assumptions. The first section investigatespriate? If the forecasts are unbiased then, in (1),a how the test’s calculations are related to forecastandb will equal 0 and 1, respectively. One apparent correction. This is followed by details of a series ofsolution is to test the joint null hypothesis that these simulation experiments that were carried out tovalues apply by using the ‘unbiasedness’ test, (Hol- provide an assessment of the value of using theden, Peel & Thompson, 1985; Johnston, 1972; unbiasedness test to determine whether correctionLopes, 1998) on the past forecasts. Correction is then should be used. The practical implications of theonly used if the test indicates that these forecasts are simulation results are then discussed. Note that onlysignificantly biased. This procedure was recom- the MSE emanating from systematic bias is ofmended by Moriarty (1985) as a design feature of interest in this paper—we are not concerned with theforecasting systems involving management judg- ability of forecasts to predict the random noise in aments and Fildes (1991), Shaffer (1998) and Good- series, only with their ability to predict the underly-win (1997, 2000) all applied the unbiasedness test to ing signal. Henceforward, when we refer to the

Page 3: Debiasing forecasts: how useful is the unbiasedness test?

P. Goodwin, R. Lawton / International Journal of Forecasting 19 (2003) 467–475 469

MSE, we will therefore be using it as a measure of The size of the desired forecast correction is givenhow well the forecasts were able to predict the by:known underlying signal of our simulated series.

c 5 f 2 p 5a 1 (b 2 1)p (4)t t t tOf course, in practice, accuracy will not be theonly consideration in forecasting and there are likely while the correction that is actually applied is:to be wider practical concerns relating to organisa-

ˆˆ ˆc 5a 1 (b2 1)p (5)t ttional behaviour and politics. In particular, theindiscriminate correction of judgmental forecasts

Then (see Appendix A)may de-motivate forecasters or cause them to makepre-emptory changes to their forecasts in order to 2 2ˆE(c )5 [a 1 (b 21)E( p )]t tnullify the subsequent effects of correction. Correc-

2tion is therefore likely to be more appropriate when 2s2 ]]1 (b 21) V( p )1 (6)tapplied by forecast users, particularly when they are nseparated from those involved in the development of 22s2the forecasts. In the discussion which follows the ]]5E(c )1 (7)t nsupplier of the forecasts will be referred to as the‘forecaster’ and the person who is considering A number of interesting points can be discerned

2correcting these forecasts will be referred to as the ˆfrom this result. First, note thatE(c ), which repre-t‘analyst’. sents the expected square of the correction that wewill actually apply to the original forecasts, has beenpartitioned into two terms. The first is the expected

2 . Theoretical underpinnings square of the desirable correction that we should beapplying to the forecasts and the second is a squared

To test the joint null hypothesis thata equals 0 correction that we will actually be applying as aand b equals 1, the unbiasedness test involves the result of sampling errors in estimatinga andb. Forcalculation ofF , where: example, when there is no bias in the original0

forecasts, so thata 50, b 5 1 and c 50, the ex-1 2 pected squared correction applied to the forecasts2 2ˆ ˆ] ˆ ˆ ¯s d s dna 1 b2 1 O p 12na b21 pf g 2t2 will be 2s /n.]]]]]]]]]]]]F 5 (3)0 1 Secondly, a rearrangement of the formula for the2ˆ]] ˆO (a 2a2bp )t tn 2 2 F (see Appendix D) reveals that it represents the0

ratio of estimates of the terms in (6): i.e.and n5the number of pairs of forecasts and out-

2 2¯comes in the sample;p 5 (1 /n) o p . ˆ ˆˆt a 1 (b2 1)E( p ) 1 (b21) V( p )f gt t]]]]]]]]]]]F 5 (8)Assuming that thee (in (1)) are independently 0 2t ˆ2s2and identically distributed asN(0, s ), and that the ]]

nnull hypothesis is true,F has theF distribution with0

2 and n 22 degrees of freedom. It is useful to The numerator ofF is our estimate of the mean0

examine how the structure ofF relates to the squared correction that the forecasts require. The01forecast correction. denominator is our estimate of the mean squared

correction that we would find ourselves erroneously1 applying if the forecasts were unbiased. Hence theNote that, although the unbiasedness test assumes that values ofp are fixed, in practice the forecasts may best be regarded as unbiasedness test poses the question: is our estimatet

stochastic as the forecaster is likely to react to variations in the of the mean squared correction required by theenvironment. Moreover, in the simulations, discussed later, values forecasts significantly greater than that which weof p were sampled randomly. However, since these values ofpt t would expect to be estimated for unbiased forecasts?were sampled independently of thee the distinction between fixedt

It is important to note that the discussion so farand stochastic regressors should not be of concern (Kmenta, 1986,p. 338). has concerned the ability of correction to improve

Page 4: Debiasing forecasts: how useful is the unbiasedness test?

470 P. Goodwin, R. Lawton / International Journal of Forecasting 19 (2003) 467–475

past forecasts. In practice, we will be interested in investigated for 121 pairs of values ofa and b.how well the unbiasedness test, based on past data, Typically thea values ranged from210 to 110can provide accurate guidance on whether correc- (incrementing in steps of12) and theb values fromtions should be made to future forecasts. In this case, to 0.9 to 1.1 (incrementing in steps of 0.02), thougheven if the relationship between the outcomes and these ranges of values were extended when this wasthe forecasts remains constant, the denominator ofF necessary to obtain a complete view of the test’s0

will tend to be underestimated. This is because the performance. For each pair ofa andb values:estimate is based on observations to which theregression model has been fitted—obviously, the i) values of the unbiased forecasts (f ) were gener-t

model will not have been fitted to future outcomes ated form consecutive periods using Monte-Carloand forecast values. For example, in cases where the simulation (wherem522, 30, 40 or 50, depend-p are normally distributed, corrections to future ing on the simulation run); these outcomes weret

forecasts will be such that (see Appendix B): independently sampled from a normal2distribution with a mean of 100 and standard22s 2n2 deviation of 20;ˆ ]] ]]]E(c ua 50, b 5 1)( 11 (9)t F 2Gn (n 2 1) ii) the ‘original’ forecasts for these periods (p ) weret

obtained from:where the regression model has been fitted ton past

f 2aobservations. It can be shown (see Appendix C) that t]]p 5 (so that:f 5a 1bp )t t tbthe right hand side of (9) represents the expected

mean squared errorafter correction of the futureiii)noise (e ) (independently sampled from a normaltforecasts (assuming thep are normally distributed).t distribution with a mean of 0 and, depending onIn order to investigate the value of the unbiased-

the experiment, a standard deviation of 10, 30 orness test as a guide to future forecast correction,100) was then added to each value of theunder conditions where the true forecast–signalunbiased forecast (f ) to obtain the actual out-trelationship remains constant, a series of simulationcomes for each period (a ) (so that a 5a 1t texperiments was carried out. These experiments arebp 1 e );t tconsidered next.

iv) the first m 2 10 periods were used to fit theregression model:

ˆ3 . Details of simulation experiments ˆa 5a 1bpt t

ˆˆand the unbiasedness test was applied toa andbThe simulation experiments were designed toat the 5% level of significance;examine the value of the unbiasedness test, as a

v) the equation:guide to whether future forecasts should be cor-rected, when the conditions appropriate for the test ˆ ˆˆf 5a 1bpt twere all met. In particular, these conditions meantthat a and f needed to be stationary, to avoid the was used to correct forecasts for the remaining 10t t

danger of spurious regression (Granger & Newbold, periods (as in (2)) and the original mean squared1974; Phillips, 1986). Also thep needed to be errors, and those of the corrected forecasts, weret

independent of the error terms,e (for all t andr), recorded (recall that these mean squared errorst1r

otherwise the least squares estimates ofa and b (MSEs) only measured the squared error resultingwould be biased (Johnston, 1972). These conditions from systematic bias since the forecasts were notwere met by generating random samples ofa andp expected to predict the noise);t t

using the modela 5a 1bp 1 e where e |N(0,t t t t2

s ). 2As a check on the results and the simulation model, outcomesThe simulations were carried out as follows. In were also sampled from normal distributions with different means

each experiment, the performance of the test was and standard deviations.

Page 5: Debiasing forecasts: how useful is the unbiasedness test?

P. Goodwin, R. Lawton / International Journal of Forecasting 19 (2003) 467–475 471

vi) steps (i) to (v) were repeated until 1000 replica- casts and actuals and the MSE of ten out-of-sampletions had been carried out (using 5000 and 10 000 forecasts was recorded before and after correction.

2replications made little difference to the results so The noise variance (s ) was 900.1000 replications were judged to be sufficient). Recall that, because thep in the simulations weret

sampled from a normal distribution, the expectedMSE resulting from ‘always correcting’ is given by

4 . Results of experiments the right hand side of (9) and hence, under theconditions represented in Fig. 1, is approximately

2Where a and b are 0 and 1, respectively, the 100. Clearly, if the pre-correction MSE,E(c ),t

expected percentage of replications yielding a signifi- exceeds this value then the optimal policy is ‘alwayscant value ofF at the 5% level of significance is to correct’. When the MSE of the original forecasts isclearly 5%. When the experiment was initially run small (below 100) ‘never correcting’ is always thewith 10 000 replications, 4.98% of the replications best policy, but there is little to choose between thisyielded statistically significant results. This differ- policy and applying the unbiasedness test. This isence between the expected and actual percentage was because, under these conditions, the test tends not tonot significant (z50.09, p50.464, using the normal give significant results and therefore serves to protectapproximation to the binomial distribution), so there the analyst from making damaging corrections to thewas no evidence to suggest that the simulation model forecasts. However, where the MSE of the originalwas invalid. Other results supported the validity of forecasts is larger (between 100 and about 1500 in

2ˆthe model. For example, values forE(c ) conformed Fig. 1), and hence correctionwould be expected tot

very closely to those predicted by (7). improve the forecasts, there is a possibility that theA typical set of simulation results is shown in Fig. unbiasedness test will not be significant. In cases

1. This compares, for different levels of bias in the where this happens desirable corrections will not beoriginal forecasts, the MSE that results from three carried out. When the pre-correction MSE exceedspolicies: (i) not correcting at all, (ii) always correct- this range, the test is certain to be significant, soing, and (iii) only correcting when the unbiasedness correction is always implemented, anyway. Thistest is significant at the 5% level. In the simulation pattern was typical of all the simulation results,shown, the regression model was fitted to 20 fore- although of course, the actual values of the MSEs

Fig. 1. The effectiveness of using the unbiasedness as a guide to forecast correction.

Page 6: Debiasing forecasts: how useful is the unbiasedness test?

472 P. Goodwin, R. Lawton / International Journal of Forecasting 19 (2003) 467–475

varied, depending on the level of noise and sample same or higher MSEs than indiscriminate correctionsize. for all but relatively low levels of bias. It has also

This suggests that, where the forecast bias is large, been assumed so far that the test is being carried outthe test offers no advantage, relative to a policy of at the 5% significance level. The effect of increasingalways correcting, and for ‘moderate’ levels of bias the significance level (e.g. to 10%) is to increase theit can be seriously misleading. It is therefore only resulting post-correction MSE for low levels of biasuseful when forecasts are unbiased, or have very (because the test suggests that correction should besmall levels of bias. The usefulness of the test for used more often when it is not desirable) and toonly very low levels of bias occurs because correc- correspondingly reduce this MSE for higher levels oftion will still improve the expected accuracy of bias. As the significance level is increased, the curvefuture forecasts even where their MSE before correc- representing the use of the unbiasedness test, in Fig.

2tion, E(c ), is relatively small compared to the noise 1, will become more similar to the ‘line’ representingt2variance,s . If correction is expected to improve the decision to ‘always correct’. At 100% signifi-

future forecast accuracy then the expected pre-cor- cance the two will coincide.rection MSE must exceed the expected MSE aftercorrection. For example, if thep are normallyt

distributed, this will occur where: 5 . Conclusions

22s 2n The results of this study suggest that, even where2 ]] ]]]E(c ). 11 (10)F Gt 2 the conditions necessary for the application of then (n 21)unbiasedness test are met, its guidance on whether

Thus, ifa andb are estimated from only 12 pairs forecasts should be corrected is likely to be of2of observations,E(c ) needs only to be at least negative value in most practical circumstances in-t

19.97% of the noise variance for correction, based on volving judgmental forecasts. Indeed, because cor-these estimates, to improve expected accuracy. Whenrection is effective even where the systematic bias isn is 60, the corresponding percentage is only 3.45%. fairly low, relative to the noise there needs to beIf forecasts are based on judgment, the literature strong prior evidence to suggest that the bias may besuggests that it is reasonable to expect that they will very small or non-existent to justify a decision tosuffer from sizeable biases (Goodwin & Wright, employ the unbiasedness test. It seems unlikely that1993;Webby & O’Connor, 1996; Lawrence, O’Con- such evidence will be available in most settings.nor & Edmundson, 2000). In this case, the size of the Why does the unbiasedness test offer such poorbias is likely to be sufficient to merit a policy of guidance on whether forecasts should be corrected?indiscriminate correction and the use of the un- The main reason is that it provides answers to thebiasedness test is inadvisable. wrong question (Cohen, 1994). To decide whether to

The results in Fig. 1 and the expression given in correct a set of forecasts we need to answer to the(9) are all based on thep being normally distributed. question: ‘‘Will accuracy improve if we correct thet

When the simulations were re-run with thep follow- forecasts?’’. To calculate the expected improvementt

ing either uniform or lognormal distributions differ- we need to have a probability distribution for the sizeent results were obtained for the post-correction of improvement. This probability distribution will beMSEs, but the general pattern was always similar to informed by the data we have on past forecasts andthat shown in Fig. 1. Similarly, when the robustness actuals, so we require the following conditionalof the result was tested on stationary autocorrelated probability distribution:

3series the use of the unbiasedness test also led to thef(IuD) whereI is the size of the improvement

and D the available data.3The use of stationary autocorrelated series violates the assump-tion that the stochastic regressors are uncorrelated with thee s.t

The unbiasedness test answers adifferent ques-The results are reported here simply to indicate the robustness ofthe earlier results. tion. It answers the question ‘‘If the forecasts are

Page 7: Debiasing forecasts: how useful is the unbiasedness test?

P. Goodwin, R. Lawton / International Journal of Forecasting 19 (2003) 467–475 473

completely unbiased, what is the probability that our To see this, for simplicity, letavailable sample of forecasts and actuals will suggest

2¯( p 2 p )this level of bias (or worse)?’’. It therefore gives us t]]]]k 5 ntp(D or worseu forecasts are unbiased) and obtains 2¯O ( p 2 p )ithis probability from the corresponding conditionali51

distribution. Thus, although the argument ‘‘beforeNote thatE(k )5 k is constant for allt 51,2,. . . , ncorrecting forecasts first test them for bias’’ may t

nand thato k 5 1.sound intuitively reasonable, it is in fact specious. As t51 t

Thus nk 5 1 andk 51/n. Hencethis study has shown, ignoring this fact can lead towasted effort and a reduction in forecast accuracy. 22s 2ˆ ]]V(c )5 1 (b 2 1) V( p ) (iii)t tn

A cknowledgements so from (i), (ii) and (iii)

22sThe authors would like to thank the referees who 2 2 2ˆ ]]E(c )5 [a 1 b 21 E( p )] 1 b 2 1 V( p )1s d s dt t tmade several excellent suggestions for the improve- nment of earlier versions of this paper.

A ppendix B. Correcting future forecasts

2 Let p be a forecast from the same distribution asˆ jA ppendix A. Derivation of E(c )tp , . . . , p but not part of that sample, and assume1 n

that p and all of the p are independently andˆ j ic 5the size of the correction that is actuallytnormally distributed.applied to the original forecasts (p )t

2 2 ˆE(c )5a 1 b 2 1 E( p )s dj jˆ ˆ ˆE(c )5 [E(c )] 1V(c ) (i)t t t

as before butCorrecting past forecasts

2 2Let p be a forecast belonging to the sample ofn ¯p 2 pt s s dj22] ]]]V(c )5 1 b 21 V( p )1s Es dforecasts, p , . . . , p , that are used to obtain the S Dj j1 n 2n ¯S p 2 ps diregression model.

Using the approximate expression for the expecta-ˆE(c up )5a 1 (b 2 1)pt t t tion of a quotient given in Mood et al. (1974, p. 181)so it can be shown that:

ˆE(c )5a 1 (b 2 1)E( p ) (ii) 2t t 2¯p 2 ps d (n 1 1)j]]] ]]]E (S D 22¯S p 2 p n(n 2 1)s dˆ ˆ ˆV(c )5E[V(c up )] 1V [E(c up )] it t t t t

so that:(E.g., see Mood, Graybill & Boes, 1974, p. 159)

2ˆˆ5E[V(a 1 (b21)p )] 1V [a 1 (b 21)p ] 2s 2nt t2ˆ ]] ]]]V(c )( 11 1 b 21 V( p )s dj F 2G jn (n 21)2¯p 2 p1 s dt2 ] ]]]5E s 1 1V [a 1 (b 21)p ]F S DG t2n ¯S p 2 p ands di

2 2 2However, ˆE(c )([a 1 b 2 1 E( p )] 1 b 2 1 V( p )1s d s dj j j

22¯p 2 ps d 2s 2nt]]]E 5 1/n ]] ]]]11F G F 2G2 n¯S p 2 p (n 2 1)s di

Page 8: Debiasing forecasts: how useful is the unbiasedness test?

474 P. Goodwin, R. Lawton / International Journal of Forecasting 19 (2003) 467–475

A ppendix C. Derivation of post-correction 2 2 2ˆ ˆˆ ˆa 1 (b21) E( p ) 1V( p ) 1 2a(b2 1)E( p )sf g dt t t]]]]]]]]]]]]]expected mean squared error 5 2ˆ2s

]nThe expected post-correctionMSE 5E(MSE)5

2ˆE[(c 2 c ) ] (e.g., if the real correction needed,c 5t t t (since values ofp are taken to be non-stochastic,tˆ5; the actual correction made,c 5 2, so the forecastt p 5E( p ) and the variance of the observedp s5t t

error (against the underlying signal) after correction V( p ))twill be 52253 and the squared error will be 9).

2 22ˆ 2Let z 5E(c )2E(c ) ˆ ˆt t a 1 (b21)E( p ) 1 (b2 1) V( p )f gt t

]]]]]]]]]]]5 22 2 2 ˆˆ ˆ ˆ 2sE[(c 2 c ) ] 5E(c )1E(c )22E(c c )t t t t t t ]]n2 2 ˆ ˆ5E(c )1E(c )1 z 2 2E(c )E(c )2 2 cov(c c )t t t t t t

2 2 R eferencesˆ5 2[E(c )2E (c )] 1 z 2 2 cov(c c )t t t t

ˆ5 2V(c )1 z 2 2 cov(c c )t t t Ahlburg, D. A. (1984). Forecasting evaluation and improvementusing Theil’s decomposition.Journal of Forecasting, 3, 345–

But 351.Cohen, J. (1994). The earth is round (p,0.05). Americanˆcov(c c )5V(c )t t t Psychologist, 49, 997–1003.Elgers, P. T., May, H. L., & Murray, D. (1995). Note onsince

adjustments to analysts’ earning forecasts based upon sys-tematic cross-sectional components of prior-period errors.c 5 c 1 random errort t Management Science, 41, 1392–1396.

Fildes, R. (1991). Efficient use of information in the formation ofHence:subjective industry forecasts.Journal of Forecasting, 10, 597–

2 2 2 617.ˆ ˆE[(c 2 c ) ] 5 z 5E(c )2E(c )t t t tFildes, R., & Hastings, R. (1994). The organisation and improve-

ment of market forecasting.Journal of the Operational Re-so thatsearch Society, 45, 1–16.

2 Goodwin, P. (1997). Adjusting judgemental extrapolations using2s]] Theil’s method and discounted weighted regression.Journal ofE(MSE)5 n Forecasting, 16, 37–46.

Goodwin, P. (2000). Correct or combine? Mechanically inte-for past forecasts and, for example,grating judgmental forecasts with statistical methods.Interna-

2 tional Journal of Forecasting, 16, 261–275.2s 2n]] ]]] Goodwin, P., & Wright, G. (1993). Improving judgmental timeE(MSE)( 11F 2Gn (n 2 1) series forecasting: a review of the guidance provided by

research.International Journal of Forecasting, 9, 147–161.for future forecasts, where thep are normally Granger, C. W. J., & Newbold, P. (1974). Spurious regressions int

distributed. econometrics.Journal of Econometrics, 2, 111–120.Holden, K., Peel, D. A., & Thompson, J. L. (1985).Expectations:

theory and evidence. New York: St. Martin’s Press.Johnston, J. (1972).Econometric methods, 2nd ed. New York:

A ppendix D. Structure of the F statistic McGraw-Hill.Kmenta, J. (1986).Elements of econometrics, 2nd ed. New York:

Macmillan.2 2 2ˆ ˆˆ ˆ ¯na 1 (b2 1) O p 1 2na(b2 1)p Lawrence, M., O’Connor, M., & Edmundson, R. (2000). A fieldt]]]]]]]]]]F 5 (see (3))o study of sales forecasting accuracy and processes.European2 2ˆ]] ˆO (a 2a2bp ) Journal of Operations Research, 122, 151–160.t tn 2 2

Lopes, A. S. (1998). On the ‘restricted cointegration test’ as a test2 2 2 of the rational expectations hypothesis.Applied Economics, 30,ˆ ˆˆ ˆ ¯a 1 (b21) O p /n 1 2a(b2 1)pt 269–278.]]]]]]]]]]5 2ˆ2s Mood, A. M., Graybill, F. A., & Boes, D. C. (1974).Introduction

]n to the theory of statistics, 3rd ed. Tokyo: McGraw Hill.

Page 9: Debiasing forecasts: how useful is the unbiasedness test?

P. Goodwin, R. Lawton / International Journal of Forecasting 19 (2003) 467–475 475

Moriarty, M. M. (1985). Design features of forecasting systems Richard LAWTON is a Senior Lecturer in Operational Research atinvolving management judgments.Journal of Marketing Re- the University of the West of England, where he is also Chair ofsearch, 22, 353–364. the Faculty Board for the Faculty of Computing, Engineering and

Phillips, P. C. B. (1986). Understanding spurious regressions in Mathematical Sciences. He holds a PhD from the University ofeconometrics.Journal of Econometrics, 33, 311–340. Bath and his research interests focus on short-term forecasting

Shaffer, S. (1998). Information content of forecast errors.Econ- methods, including automatic procedures for method selection.omics Letters, 59, 45–48. His publications in this area have appeared in the International

Theil, H. (1971). Applied economic forecasting. Amsterdam: Journal of Forecasting.North-Holland Publishing Company.

Webby, R., & O’Connor, M. (1996). Judgmental and statisticaltime series forecasting: A review of the literature.InternationalJournal of Forecasting, 12, 91–118.

Biographies: Paul GOODWIN is a Lecturer in ManagementScience in the Management School at the University of Bath. Hisresearch interests focus on the role of judgment in forecasting anddecision-making and he holds a PhD from the University ofLancaster. He has published articles in a number of journalsincluding the International Journal of Forecasting, the Journal ofForecasting, Omega and the Journal of Management Studies. He isalso a co-editor of Forecasting with Judgment, published byWiley.