datsol

CHAPTER 2

A REVIEW OF BASIC STATISTICAL CONCEPTS

ANSWERS TO ODD NUMBERED PROBLEMS

1. Descriptive Statistics

Variable N Mean Median Tr Mean StDev SE MeanC1 28 21.32 17.00 20.69 13.37 2.53

Variable Min Max Q1 Q3C1 5.00 54.00 11.25 28.75

a. X = 21.32

b. S = 13.37

c. S2 = 178.76

d. If the policy is successful, smaller orders will be eliminated and the mean will increase.

e. If the change causes all customers to consolidate a number of small orders into large orders, the standard deviation will probably decrease. Otherwise, it is very difficult to tell how the standard deviation will be affected.

f. The best forecast over the long-term is the mean of 21.32.

3. a. Point estimate:

b. 1 = .95 Z = 1.96, n = 30, (5.85%, 15.67%)

c. df = 301 = 29, t = 2.045 (5.64%, 15.88%)

d. We see that the 95% confidence intervals in b and c are not much different. This explains why a sample of size n = 30 is often taken as the cutoff between large and small samples.

5. H0: = 12.1 n = 100 = .05

1

H1: > 12.1 S = 1.7 X = 13.5

Reject H0 if Z > 1.645

Z = = 8.235

Reject H0 since the computed Z (8.235) is greater than the critical Z (1.645). The mean has increased.

7. n = 60,

two-sided test, = .05, critical value: |Z|= 1.96

Test statistic:

Since |2.67| = 2.67 > 1.96, reject at the 5% level. The mean satisfaction rating is

different from 5.9. p-value: P(Z < 2.67 or Z > 2.67) = 2 P(Z > 2.67) = 2(.0038) = .0076, very strong

evidence against

9. H0: = 700 n = 50 = .05H1: 700 S = 50 X = 715

Reject H0 if Z < -1.96 or Z > 1.96

Z = = 2.12

Since the calculated Z is greater than the critical Z (2.12 > 1.96), reject the null hypothesis. The forecast does not appear to be reasonable.

p-value: P(Z < 2.12 or Z > 2.12) = 2 P(Z > 2.12) = 2(.017) = .034, strong evidence against

11. a.

2

b. Positive

c. Y = 6058 Y2 = 4,799,724 X = 59X2 = 513 XY = 48,665 r = .938

13. This is a good population for showing how random samples are taken. If three-digit random numbers are generated from Minitab as demonstrated in Problem 10, the selected

3

items for the sample can be easily found. In this population, = 0.06, so most students will not reject the null hypothesis which so states this. The few students that erroneously

reject H0 demonstrate Type I error.

15. n = 175, Point estimate: 98% confidence interval: 1 = .98 Z = 2.33

(43.4, 47.0)

Hypothesis test:

two-sided test, = .02, critical value: |Z|= 2.33

Test statistic:

Since |Z| = 1.54 < 2.33, do not reject at the 2% level.

As expected, the results of the hypothesis test are consistent with the confidence interval for ; = 44 is not ruled out by either procedure.

CHAPTER 3

EXPLORING DATA PATTERNS ANDCHOOSING A FORECASTING TECHNIQUE

4


1. Qualitative forecasting techniques rely on human judgment and intuition. Quantitative forecasting techniques rely more on manipulation of past historical data.

3. The secular trend of a time series is the long-term component that represents the growth or decline in the series over an extended period of time. The cyclical component is the wave- like fluctuation around the trend. The seasonal component is a pattern of change that repeats itself year after year. The irregular component measures the variability of the time series after the other components have been removed.

5. The autocorrelation coefficient measures the correlation between a variable, lagged one or more periods, and itself.

7. a. nonstationary series b. stationary series c. nonstationary series d. stationary series

9. Naive methods, simple averaging methods, moving averages, simple exponential smoothing, and Box-Jenkins methods. Examples are: the number of breakdowns per week on an assembly line having a uniform production rate; the unit sales of a product or service in the maturation stage of its life cycle; and the number of sales resulting from a constant level of effort.

11. Classical decomposition, census II, Winter's exponential smoothing, time series multiple regression, and Box-Jenkins methods. Examples are: electrical consumption, summer/winter activities (sports like skiing), clothing, and agricultural growing seasons, retail sales influenced by holidays, three-day weekends, and school calendars.

13. 1985 2,413 -1986 2,407 -61987 2,403 -41988 2,396 -71989 2,403 71990 2,443 451991 2,371 -771992 2,362 -9 1993 2,334 -28 1994 2,362 281995 2,336 -261996 2,344 8

5

1997 2,384 401998 2,244 -140

Yes! The original series has a decreasing trend.

15. a. MPE b. MAPE c. MSE

17. Y = 4465.7 (Yt-1 Y ) = 2396.1 (Yt Y )2 = 55,912,891

(Yt Y ) (Yt-1 Y ) = 50,056,722 (Yt Y ) (Yt-2 Y ) = 44,081,598

r1 = 50 056 722

55 912 891

, ,

, , = .895

H0: ρ1 = 0H1: ρ1 0

Reject if t < -2.069 or t > 2.069

SE( ) = = = = .204

= = 4.39

Since the computed t (4.39) is greater than the critical t (2.069), reject the null.

H0: ρ2 = 0H1: ρ2 0

Reject if t < -2.069 or t > 2.069

r2 = 44,081598

55 912 891

,

, , = .788

SE( ) = = =2 6

24

.= .33

.

.

788 0

33= 2.39

Since the computed t (4.39) is greater than the critical t (2.069), reject the null.

b. The data are nonstationary.

6

654321

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

LBQTCorrLag

72.04

68.73

62.57

52.85

39.37

21.74

0.63

0.92

1.25

1.68

2.39

4.39

0.31

0.43

0.56

0.67

0.79

0.90

6

5

4

3

2

1

Autocorrelation Function for Loans

19. Figure 3.18 - The data are nonstationary. (Trended data) Figure 3.19 - The data are random. Figure 3.20 - The data are seasonal. (Monthly data) Figure 3.21 - The data are stationary and have some pattern that could be modeled.

7

21. a. Time series plot follows

b. The sales time series appears to vary about a fixed level so it is stationary.

c. The sample autocorrelation function for the sales series follows

The sample autocorrelations die out rapidly. This behavior is consistent with a stationary series. Note that the sales data are not random. Sales in adjacent weeks

tend to be positively correlated.

Since, in this case, the residuals differ from the original observations by the constant

8

CHAPTER 4

MOVING AVERAGES AND SMOOTHING METHODS


1. Exponential smoothing

3. Moving average

5. Winter's three-parameter linear and seasonal model

7.Price AVER1 FITS1 RESI119.39 * * *18.96 * * *18.20 18.8500 * *17.89 18.3500 18.8500 -0.9600018.43 18.1733 18.3500 0.0800019.98 18.7667 18.1733 1.8066719.51 19.3067 18.7667 0.7433320.63 20.0400 19.3067 1.3233319.78 19.9733 20.0400 -0.2600021.25 20.5533 19.9733 1.2766721.18 20.7367 20.5533 0.6266722.14 21.5233 20.7367 1.40333

Accuracy MeasuresMAPE: 4.63186 MAD: 0.94222 MSD: 1.17281

The naïve approach is better.

a. 221.2 is forecast for period 9

9. 3-month moving-average

9

Period Yield AVER1 FITS1 RESI1 1 9.29 * * *

2 9.99 * * * 3 10.16 9.8133 * * 4 10.25 10.1333 9.8133 0.436667 5 10.61 10.3400 10.1333 0.476667 6 11.07 10.6433 10.3400 0.730000 7 11.52 11.0667 10.6433 0.876667 8 11.09 11.2267 11.0667 0.023333 9 10.80 11.1367 11.2267 -0.426667 10 10.50 10.7967 11.1367 -0.636667 11 10.86 10.7200 10.7967 0.063333 12 9.97 10.4433 10.7200 -0.750000

Accuracy MeasuresMAPE: 4.58749 MAD: 0.49111 MSD: 0.31931 MPE: .6904

Row Period Forecast Lower Upper 1 13 10.4433 9.33579 11.5509

Actual

Predicted

Forecast

Actual

Predicted

Forecast

14121086420

11.5

10.5

9.5

Yield

Time

MSD:

MAD:

MAPE:

Length:

Moving Average

0.31931

0.49111

4.58749

3

Moving Average

b. 5-month moving-average

Period Yield MA Predict Error 1 9.29 * * *

10

2 9.99 * * * 3 10.16 * * *

4 10.25 * * * 5 10.61 10.060 * *

6 11.07 10.416 10.060 1.010 7 11.52 10.722 10.416 1.104

8 11.09 10.908 10.722 0.368 9 10.80 11.018 10.908 -0.108

10 10.50 10.996 11.018 -0.518 11 10.86 10.954 10.996 -0.136

12 9.97 10.644 10.954 -0.984


Accuracy MeasuresMAPE: 5.58295 MAD: 0.60400 MSD: 0.52015 MPE: .7100

Actual

Predicted

Forecast

Actual

Predicted

Forecast

14121086420

12

11

10

Yield

Time

MSD:

MAD:

MAPE:

Length:

Moving Average

0.52015

0.60400

5.58295

5

Moving Average

g. Use 3-month moving average forecast: 10.4433

11. Time Demand Smooth Predict Error

11

1 205 205.000 205.000 0.0000 2 251 228.000 205.000 46.0000 3 304 266.000 228.000 76.0000 4 284 275.000 266.000 18.0000

5 352 313.500 275.000 77.0000 6 300 306.750 313.500 -13.5000

7 241 273.875 306.750 -65.7500 8 284 278.938 273.875 10.1250

9 312 295.469 278.938 33.0625 10 289 292.234 295.469 -6.4688 11 385 338.617 292.234 92.7656 12 256 297.309 338.617 -82.6172

Accuracy Measures MAPE: 14.67 MAD: 43.44 MSD: 2943.24


Actual

Predicted

Forecast

Actual

Predicted

Forecast

14121086420

400

300

200

Demand

Time

MSD:

MAD:

MAPE:

Alpha:

Smoothing Constant

2943.24

43.44

14.67

0.500

Single Exponential Smoothing

12

13. a. = .4


Row Period Forecast Lower Upper

1 57 326.367 267.513 385.221

b. = 0.6



c. A smoothing constant of approximately .6 appears to be best.

13

d. No! When the residual autocorrelations shown above are examined, some of them were found to be significant.

15. The autocorrelation function shows that the data are seasonal with a slight trend. Therefore, the Winters’ model is used to forecast revenues.

Smoothing Constants - Alpha (level): 0.8 Beta (trend): 0.1 Gamma (seasonal): 0.1

14

An examination of the autocorrelation coefficients for the residuals of this Winters’ model shown below indicates that none of them are significantly different from zero.

15

CHAPTER 5

TIME SERIES AND THEIR COMPONENTS


1. The purpose of decomposing a time series variable is to observe its various elements in isolation. By doing so, insights into the causes of the variability of the series are

frequently gained. A second important reason for isolating time series components is to facilitate the forecasting process.

3. The basic forces that affect and help explain the trend-cycle of a series are population growth, price inflation, technological change, and productivity increases.

5. Weather and the calendar year such as holidays affect the seasonal component.

7. a.

YEAR Y T C1980 11424 11105.2 102.8711981 12811 12900.6 99.3061982 14566 14695.9 99.1161983 16542 16491.3 100.307

1984 19670 18286.7 107.565 1985 20770 20082.1 103.426

1986 22585 21877.4 103.234

16

1987 23904 23672.8 100.9771988 25686 25468.8 100.8551989 26891 27263.6 98.6331990 29073 29059.0 100.0481991 28189 30854.3 91.3621992 30450 32649.7 93.2631993 31698 34445.1 92.0251994 35435 36240.5 97.7771995 37828 38035.8 99.4541996 42484 39831.2 106.6601997 44580 41626.6 107.095

b. = 9309.81 + 1795.38X

c. = 9309.81 + 1795.38(19) = 43422.03

d. If there is a cyclical affect, it is very slight.

9. Y = TS = 850(1.12) = $952

11. All of the statements are correct except d.

13. a. The regression equation and seasonal indexes are shown below. The trend is the most important but both trend and seasonal should be used to forecast.

Trend Line Equation = 2241.77 + 25.5111X

Seasonal Indices

Period Index 1 0.96392 2 1.02559 3 1.00323 4 1.00726

Accuracy of Model MAPE: 3.2 MAD: 87.3 MSD: 11114.7

b. Forecasts

Row Period Forecast

17

1 43 3349.53 2 44 3388.67

c. The forecast for third quarter is very accurate (3,349.5 versus 3,340)The forecast for fourth quarter is high compared to Value Line (3,388.7 versus 3,300). Following are computer generated graphs for this problem:

Actual

Predicted

Forecast

Actual

Predicted

Forecast

454035302520151050

3400

3200

3000

2800

2600

2400

2200

2000

Time

Sales

MSD:

MAD:

MAPE:

11114.7

87.3

3.2

Decomposition Fit for Sales

18

15. a. Addivitive Decomposition Ln(Sales)

19

Data LnCavSales Length 77.0000

NMissing 0 Trend Line Equation

Yt = 4.61322 + 2.19E-02*t

Seasonal Indices

Period Index

1 0.33461 2 -0.01814 3 -0.40249 4 -0.63699 5 -0.71401 6 -0.57058 7 -0.27300 8 -0.00120 9 0.46996 10 0.72291 11 0.74671 12 0.34220

b. Pronounced trend and seasonal components. Would use both to forecast.

c. Forecasts

20

Date Period Forecasts(Ln(Sales)) Forecasts(Sales)

Jun. 78 5.74866 314 Jul. 79 6.06811 432 Aug. 80 6.36178 579

Sep. 81 6.85482 948Oct. 82 7.12964 1248Nov. 83 7.17532 1307Dec. 84 6.79267 891

d. See last column of table in part c. e. Forecasts of sales developed from additive decomposition are higher (for

all months June 2000 through December 2000) than those developed from the multiplicative decomposition. Forecasts from multiplicative decomposition appear to be a little more consistent with recent behavior of Cavanaugh sales time series.

17. a.

Variation appears to be increasing with level. Multiplicative decomposition may be appropriate or additive decomposition with the

logarithms of demand.

b. Neither a multiplicative decomposition or an additive decomposition with a linear trend work well for this series. This time series is best modeled with other methods. The multiplicative decomposition is pictured below.

21

c. Seasonal Indices (Multiplicative Decomposition for Demand)

Period Index Period Index Period Index 1 0.951313 5 1.00495 9 1.04368

2 0.955088 6 1.00093 10 0.98570 3 0.966953 7 1.02748 11 0.99741 4 0.988927 8 1.07089 12 1.00669

Demand tends to be relatively high in the summer months.

d. Forecasts derived from a multiplicative decomposition of demand (see plot below).

Date Period Forecast

Oct. 130 172.343 Nov. 131 175.773

Dec. 132 178.803

19. a. JAN = 600

12.= 500

500(1.37) = 685 people estimated for FEB

b. = 140 + 5(t); t for JAN 2003 = 72. = 140 + 5(72) = 500

JAN = (140 + 5(72) = 500)(1.20) = 600 FEB = (140 + 5(73) = 505)(1.37) = 692 MAR = (140 + 5(74) = 510)(1.00) = 510 APR = (140 + 5(75) = 515)(0.33) = 170

22

MAY = (140 + 5(76) = 520)(0.47) = 244 JUN = (140 + 5(77) = 525)(1.25) = 656 JUL = (140 + 5(78) = 530)(1.53) = 811 AUG = (140 + 5(79) = 535)(1.51) = 808 SEP = (140 + 5(80) = 540)(0.95) = 513 OCT = (140 + 5(81) = 545)(0.60) = 327 NOV = (140 + 5(82) = 550)(0.82) = 451 DEC = (140 + 5(83) = 555)(0.97) = 538

c. 5

23. 1289.73(2.847) = 3,671.86

CHAPTER 6

REGRESSION ANALYSIS


23

1. Option b is inconsistent because the regression coefficient and the correlation coefficient must have the same sign.

3. Correlation of Sales and Expend. = 0.848

The regression equation isSales = 828 + 10.8 Expend.

Predictor Coef StDev T PConstant 828.1 136.1 6.08 0.000Expend. 10.787 2.384 4.52 0.000

S = 67.19 R-Sq = 71.9% R-Sq(adj) = 68.4%

Analysis of Variance

Source DF SS MS F PRegression 1 92432 92432 20.47 0.000Error 8 36121 4515Total 9 128552

a. Yes, because r = .848 and t = 4.52.

b. Y = 828 + 10.8X

c. Y = 828 + 10.8(50) = $1368

d. 72% since r2 = .719

e. Unexplained sum of squares = 36,121Divide by df = (n - 2) = 8 to get 4515.

4515 = 67.19 = sy.x

f. Total sum of squares is 128,552Divide by df = (n - 1) = 9 to get the variance, 14,283.6.The square root is Y's standard deviation, sy = 119.5.

5. . The regression equation is

Cost = 208 + 70.9 Age

Predictor Coef StDev T P

24

Constant 208.20 75.00 2.78 0.027Age 70.918 9.934 7.14 0.000

S = 111.6 R-Sq = 87.9% R-Sq(adj) = 86.2%



a.

25

1050

1100

1000

900

800

700

600

500

400

300

200

Age

Cost

b. Positive

c. Y = 6058 Y2 = 4,799,724 X = 59 X2 = 513 XY = 48,665 r = .938

d. Y = 208.2033 + 70.9181X

e. H0: = 0 H1: 0

Reject H0 if t < -2.365 or t > 2.365

Reject the null hypothesis. Age and maintenance cost are linearly related in the population.

f. Y = 208.2033 + 70.9181(5) = 562.7938 or $562.80

26

7. The regression equation isOrders = 15.8 + 1.11 Catalogs

Predictor Coef StDev T PConstant 15.846 3.092 5.13 0.000

Catalogs 1.1132 0.3596 3.10 0.011

S = 5.757 R-Sq = 48.9% R-Sq(adj) = 43.8%

a. r = .7 and t = 3.1 so a significant relationship exists between Y and X.

b. Y = 15.846 + 1.1132X

c. sy.x = 5.57

d. Analysis of VarianceSource DF SS MS F PRegression 1 317.53 317.53 9.58 0.011Error 10 331.38 33.14Total 11 648.92

e. 49% since r2 = .489

f. H0: = 0 H1: 0

Reject H0 if t < -3.169 or t > 3.169

Fail to reject the null hypothesis. Orders received and catalogs distributed are not linearly related in the population.

g. H0: = 0 H1: 0

Reject H0 if F > 10.04

Yes and Yes

h. Y = 13.846 + 1.1132(10) = 24.96 or 24,959

27

9. a. The two firms seem to be using very similar rationale since r = .959.

b. If ABC bids 1.01, the prediction for Y becomes COMP = -3.257 + 1.03435(101) = 101.212, the point estimate. For an interval estimate

using 95% confidence the appropriate standard error is sy.x, since only the variability of

the data points around the (population) regression line need be considered: 101.205 1.96(.7431) 101.205 1.456 99.749 to 102.661

c. If the data constitute a sample, the point estimate for Y would be the same but the

interval estimate would use the standard error of the forecast instead of Sy.x, since the variability of sample regression lines around the population regression line would have to be considered along with the scatter of sample Y values around the calculated line. The probability of winning the bid would then involve the same problem as in c. above except that sf would be used in the calculation:

Z = (101 - 101.212)/.768 = -.276 where sf = .768, giving an area of .1103. (.5 – .1103)

= .3897.

11. The regression equation isPermits = 2217 - 145 Rate

Predictor Coef StDev T PConstant 2217.4 316.2 7.01 0.000Rate -144.95 27.96 -5.18 0.000

S = 144.3 R-Sq = 79.3% R-Sq(adj) = 76.4%



a.

28

b. Y = 5375 Y2 = 3,915,429 X = 100.6 X2 = 1151.12 XY = 56,219.8

r = -0.8907

c. Reject H0 if t < - 2.365 or t > 2.365.

Computed t score = 5.1842Reject the null hypothesis: Interest rates and building permits are linearly related in the population.

d. 44.9474

e. r2 = .7934

f. Using knowledge of the linear relationship between interest rates and building permits (r = -0.8907) we can explain 79.3% of the building permit variable variance.

13. The regression equation isDefects = - 17.7 + 0.355 Size

Predictor Coef StDev T PConstant -17.731 4.626 -3.83 0.003Size 0.35495 0.02332 15.22 0.000

S = 7.863 R-Sq = 95.5% R-Sq(adj) = 95.1%Analysis of Variance

Source DF SS MS F PRegression 1 14331 14331 231.77 0.000

29

Error 11 680 62Total 12 15011

a.

b. Defects = - 17.7 + 0.355 Size

c. The slope coefficient, .355, is significantly different from zero. + 0.00101

d.

30

Examination of the residuals shows a curvilinear pattern.

e. The best model transforms the predictor variable and uses X2 as its independent variable.

The regression equation isDefects = 4.70 + 0.00101 Sizesqr

Predictor Coef StDev T PConstant 4.6973 0.9997 4.70 0.000Sizesqr 0.00100793 0.00001930 52.22 0.000

S = 2.341 R-Sq = 99.6% R-Sq(adj) = 99.6%



Unusual ObservationsObs Sizesqr Defects Fit StDev Fit Residual St Resid 3 5625 6.000 10.367 0.920 -4.367 -2.03R

f. The slope coefficient, .355, is significantly different from zero. + 0.00101

g.

Examination of the residuals shows a random pattern.

31

h. R denotes an observation with a large standardized residual

Fit StDev Fit 95.0% CI 95.0% PI 95.411 1.173 ( 92.828, 97.994) ( 89.645, 101.177)

i. Defects = 4.70 + 0.00101 Sizesqr

b. The regression equation is: Market = 60.7 + 0.414 Assessed

c. . About 38% of the variation in market prices is explained by assessed values (as predictor variable). There is a considerable amount of unexplained variation.

15. a. The regression equation is: OpExpens = 18.9 + 1.30 PlayCosts

b. . About 75% of the variation in operating expenses is explained by player costs.

c. , p-value = .000 < .10. The regression is clearly significant at the level.

d. Coefficient on = player costs is 1.3. Is reasonable?

suggests is not supported by the data. Appears that

operating expenses have a fixed cost component represented by the intercept , and then are about 1.3 times player costs.

e. , gives or (47.6, 69.6).

f. Unusual Observations

Obs Play Cost OpExpens Fit SE Fit Residual St Resid 7 18.0 60.00 42.31 1.64 17.69 3.45R

R denotes an observation with a large standardized residual Team 7 has unusually low player costs relative to operating expenses.

32

datsol

Documents

accuracy measures mape

predictor variable

regression equation

residuals shows

additive decomposition

multiplicative decomposition

strong evidence

hypothesis test