simple regression with spss

Upload: yazidmrsmbp2000

Post on 30-May-2018

244 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 Simple Regression With SPSS

    1/19

    Example of Using SPSS to Generate a Simple Regression Analysis

    Given a desire of a Retail Chain management team to develop a strategy toforecasting annual sales, the following data from a random sample of existing stores has

    been gathered:

    STORE SQUARE FOOTAGE ANNUAL SALES ($)

    1 1726.00 3681.00

    2 1642.00 3895.00

    3 2816.00 6653.00

    4 5555.00 9543.00

    5 1292.00 3418.00

    6 2208.00 5563.00

    7 1313.00 3660.00

    8 1102.00 2694.00

    9 3151.00 5468.0010 1516.00 2898.00

    11 5161.00 10674.00

    12 4567.00 7585.00

    13 5841.00 11760.00

    14 3008.00 4085.00

    We can enter the data into SPSSPc by typing it directly into the data editor, or by cuttingand pasting:

  • 8/14/2019 Simple Regression With SPSS

    2/19

    Next, by clicking on Variable View, we can apply variable and value labels whereappropriate:

    Assuming, for now, that if a relationship exists between the two variables, it is linear innature, we can generate a simple Scatterplot (or Scatter Diagram) for the data. This isaccomplished with the command sequence:

  • 8/14/2019 Simple Regression With SPSS

    3/19

    Regression Analysis for Site Selection

    Simple Scatterplot of Data

    Square Footage of Store

    70006000500040003000200010000

    SalesR

    evenueofStore

    14000

    12000

    10000

    8000

    6000

    4000

    2000

    0

    Which yields the following (editable) scatterplot:

    We can generate a simple straight line equation from the output resulting when using theEnter Command in regression:

  • 8/14/2019 Simple Regression With SPSS

    4/19

    Variables Entered/Removedb

    Square

    Footage of

    Storea

    . Enter

    Model

    1

    VariablesEntered VariablesRemoved Method

    All requested variables entered.a.

    Dependent Variable: Sales Revenue of Storeb.

    Which yields:

  • 8/14/2019 Simple Regression With SPSS

    5/19

    Mode l Summary

    .954a .910 .902 936.8500

    Model

    1

    R R Square

    Adjusted

    R Square

    Std. Error of

    the Estimate

    Predictors: (Constant), Square Footage of Storea.

    ANOVAb

    1.06E+08 1 106208119.7 121.009 .000a

    10532255 12 877687.937

    1.17E+08 13

    Regression

    Residual

    Total

    Model

    1

    Sum of

    Squares df Mean Square F Sig.

    Predictors: (Constant), Square Footage of Storea.

    Dependent Variable: Sales Revenue of Storeb.

    ^

    So then Yi = 901.247 + 1.686X (noting that no direct interpretation of the Yintercept at 0 Square Footage is possible, sothat the intercept represents the portion ofthe annual sales varying due to factors other

    than store size)

    and where

    SS

    SSSS

    Coefficientsa

    901.247 513.023 1.757 .104 -216.534 2019.027

    1.686 .153 .954 11.000 .000 1.352 2.020

    (Constant)

    Square Footage of Store

    Model1

    B Std. Error

    Unstandardized

    Coefficients

    Beta

    Standardi

    zed

    Coefficien

    ts

    t Sig. Lower Bound Upper Bound

    95% Confidence Interval for B

    Dependent Variable: Sales Revenue of Storea.

    b0

    b1

  • 8/14/2019 Simple Regression With SPSS

    6/19

    SST = SSR (regression sum of squares) + SSE (error sum of squares)

    = sum of the squared differences between each observed value for Yand Y-Bar

    SSR = sum of the squared differences between each predicted value of Yand Y-Bar

    SSE = sum of the squared differences between each observed value of Yand each predicted value for Y

    Coefficient of Determination = SSR/SSt = 0.91 (sample)

    Standard Error of the Estimate = SYX = SQRT { SSE / n - 2} = 936.85

  • 8/14/2019 Simple Regression With SPSS

    7/19

    Testing the General Assumptions of Regression and Residual Analysis

    1. Normality of Error - similar to the t-test and ANOVA, regression is robust todeparture from the normality of errors around the regression line. This assumption is

    often tested by simply plotting the Standardized Residuals (each residual divided byits standard error) on a histogram with a superimposed normal distribution, or on anormalo probability plot. SPSS allows us to perform both functions automatically(while, incidentally, saving the residual values in the original data file if this option istoggled):

  • 8/14/2019 Simple Regression With SPSS

    8/19

    Of course, the assessment of normality by visually scanning the data leaves some

    statisticians unsettled; so I usually add an appropriate test of normality conducted on thedata:

    Variable n A-D p-value

    Stand._Resid. 14 0.348 0.503

    Regression Standardized Residual

    1.00.500.00-.50-1.00-1.50-2.00

    Histogram

    Dependent Variable: Sales Revenue of Store

    Frequency

    5

    4

    3

    2

    1

    0

    Std. Dev = .96

    Mean = 0.00

    N = 14.00

    Normal P-P Plot of Regression Standardized Residual

    Dependent Variable: Sales Revenue of Store

    Observed Cum Prob

    1.00.75.50.250.00

    ExpectedCum

    Prob

    1.00

    .75

    .50

    .25

    0.00

  • 8/14/2019 Simple Regression With SPSS

    9/19

    2. Homoscedasticity - the assumption that the variability of data around the regressionline be constant for all values of X. In other words, error must be independent ofX. Generally, this assumption may be tested by plotting the X values against theraw residuals for Y. In SPSS, this must be done by plotting a Scatterplot from thesaved variables:

    Click Here

  • 8/14/2019 Simple Regression With SPSS

    10/19

    Results in data automatically added to the data file:

  • 8/14/2019 Simple Regression With SPSS

    11/19

    Then, simply produce the requisite scatterplot as before:

    Notice how there is no 'fanning' pattern to the data, implying homoscedasticity.

    Square Footage of Store

    600050004000300020001000UnstandardizedResidual

    2000

    1000

    0

    -1000

    -2000

  • 8/14/2019 Simple Regression With SPSS

    12/19

    Other authors, including those who wrote the SPSS routine, choose to plot the X valuesagainst the Studentized Residuals (Standardized Residuals Adjusted for their distancefrom the average X value) rather than the Unstandardized (raw) Residuals. SPSS willgenerate this plot automatically (select this under the Plots panel):

    Scatterplot of Studentized Residuals

    and Square Footage (X)

    Square Footage of Store

    600050004000300020001000

    StudentizedResidual

    1.5

    1.0

    .5

    0.0

    -.5

    -1.0

    -1.5

    -2.0

    -2.5

    Note the equivalence of results between the two plots. Statistically speaking, the X valuesand Residuals may be inferred to be 0.00. We can infer this using the correlation utility inSPSSPc, which tests the null hypothesis that the Pearson rho for the population is equal to0.00:

  • 8/14/2019 Simple Regression With SPSS

    13/19

  • 8/14/2019 Simple Regression With SPSS

    14/19

    It should be noted that the distribution of the data also suggest that an assumption oflinearity is also reasonable at this point.

    3) Independence of the Errors - assumes that no autocorrelation is present. Generally,evaluated by plotting the residuals in the order or sequence in which the originaldata were collected. This approach, when meaningful, uses the Durbin-WatsonStatistic and associated Tables of Critical values. SPSS can generate this valuewhen requested as part of the Model Summary:

    A number of other statistics are also available in SPSS regarding ResidualAnalysis:

    Mode l Summaryb

    .954a .910 .902 936.8500 .910 121.009 1 12 .000 2.446

    Model

    1

    R R Square

    Adjusted

    R Square

    Std. Error of

    the Estimate

    R Square

    Change F Change df1 df2 Sig. F Change

    Change Statistics

    Durbin-W

    atson

    Predictors: (Constant), Square Footage of Storea.

    Dependent Variable: Sales Revenue of Storeb.

    Correlations

    1.000 .000 .015

    . 1.000 .959

    14 14 14

    .000 1.000 .999**

    1.000 . .000

    14 14 14

    .015 .999** 1.000

    .959 .000 .

    14 14 14

    Pearson Correlation

    Sig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    Square Footage of Store

    Unstandardized Residual

    Studentized Residual

    Square

    Footage

    of Store

    Unstanda

    rdized

    Residual

    Studentize

    d Residual

    Correlation is significant at the 0.01 level (2-tailed).**.

  • 8/14/2019 Simple Regression With SPSS

    15/19

    Residuals Statisticsa

    2759.3672 10749.96 5826.9286 2858.2959 14

    -1.073 1.722 .000 1.000 14

    250.7362 512.8126 345.3026 81.3831 14

    2771.8208 10518.55 5804.4373 2830.7178 14

    -1888.14 1070.6108 -3.25E-13 900.0964 14

    -2.015 1.143 .000 .961 14

    -2.092 1.288 .011 1.035 14

    -2033.82 1442.1392 22.4913 1049.3911 14

    -2.512 1.329 -.014 1.111 14

    .003 2.967 .929 .901 14

    .001 .355 .086 .103 14

    .000 .228 .071 .069 14

    Predicted Value

    Std. Predicted Value

    Standard Error of

    Predicted Value

    Adjusted Predicted Value

    Residual

    Std. Residual

    Stud. Residual

    Deleted Residual

    Stud. Deleted Residual

    Mahal. Distance

    Cook's DistanceCentered Leverage Value

    Minimum Maximum Mean Std. Deviation N

    Dependent Variable: Sales Revenue of Storea.

  • 8/14/2019 Simple Regression With SPSS

    16/19

    Inferences About the Model and Interval Estimates

    We can determine the presence of a significant relationship between X and Y by testing todetermine whether the observed slope is significantly greater than 0, the hypothesizedslope of the regression line if no relationship existed. This can be done with a t-test,

    which divides the observed slope by the standard error of the slope (supplied by SPSS):

    or with an ANOVA model, which provides identical results:

    noting that t2, as expected, equals F; and the p-values are therefore equal. Note that SPSSalso provides the confidence interval associated with the slope.

    Finally, SPSS allows you to calculate and store both Confidence and Prediction Limitsfor the observed data. After you generate the scatterplot, left double-click on the chart;this will take you to the chart editor:

    Coefficientsa

    901.247 513.023 1.757 .104 -216.534 2019.027

    1.686 .153 .954 11.000 .000 1.352 2.020

    (Constant)

    Square Footage of Store

    Model

    1

    B Std. Error

    Unstandardized

    Coefficients

    Beta

    Standardi

    zed

    Coefficien

    ts

    t Sig. Lower Bound Upper Bound

    95% Confidence Interval for B

    Dependent Variable: Sales Revenue of Storea.

    ANOVAb

    1.06E+08 1 106208119.7 121.009 .000a

    10532255 12 877687.937

    1.17E+08 13

    Regression

    Residual

    Total

    Model

    1

    Sum of

    Squares df Mean Square F Sig.

    Predictors: (Constant), Square Footage of Storea.

    Dependent Variable: Sales Revenue of Storeb.

  • 8/14/2019 Simple Regression With SPSS

    17/19

    Next:

    Then:

  • 8/14/2019 Simple Regression With SPSS

    18/19

    Click on Fit Options

  • 8/14/2019 Simple Regression With SPSS

    19/19

    LCL UCL LPL UPL

    3135.52558 4487.50548 1661.27256 5961.75850

    2976.95430 4362.80609 1514.25297 5825.50741

    5102.73145 6196.07384 3536.24581 7762.55948

    9232.70820 11302.74446 7979.09247 12556.36019

    2309.22155 3850.24435 897.92860 5261.53731

    4028.95209 5219.51308 2497.98206 6750.48311

    2349.56701 3880.71656 935.07592 5295.20765

    1942.80866 3575.92595 560.87909 4957.85553

    5663.35086 6765.16486 4100.00127 8328.51446

    2737.79303 4177.06134 1293.06683 5621.78754

    8677.59067 10529.18763 7362.03125 11844.74705

    7827.42925 9376.22071 6418.64584 10785.004129632.63839 11867.28348 8422.94738 13076.97449

    5426.83323 6519.44789 3860.07783 8086.20329

    Regression Analysis for Site Selection

    Scatterplot of Data Including Confidence & Prediction Limits

    Square Footage of Store

    600050004000300020001000

    SalesRe

    venueofStore

    12000

    10000

    8000

    6000

    4000

    2000 Rsq = 0.9098