assumptions summer2003

Upload: ian-pratama

Post on 03-Jun-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 Assumptions Summer2003

    1/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 1

    Assumptions of multiple regression

    Assumption of normality

    Assumption of linearity

    Assumption of homoscedasticity

    Script for testing assumptions

    Practice problems

  • 8/12/2019 Assumptions Summer2003

    2/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 2

    Assumptions of Normality, Linearity, andHomoscedasticity

    Multiple regression assumes that the variables in theanalysis satisfy the assumptions of normality,linearity, and homoscedasticity. (There is also anassumption of independence of errors but thatcannot be evaluated until the regression is run.)

    There are two general strategies for checkingconformity to assumptions: pre-analysis and post-analysis. In pre-analysis, the variables are checkedprior to running the regression. In post-analysis, the

    assumptions are evaluated by looking at the patternof residuals (errors or variability) that the regressionwas unable to predict accurately.

    The text recommends pre-analysis, the strategy we

    will follow.

  • 8/12/2019 Assumptions Summer2003

    3/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 3

    Assumption of Normality

    The assumption of normality prescribes that thedistribution of cases fit the pattern of a normalcurve.

    It is evaluated for all metric variables included in the

    analysis, independent variables as well as thedependent variable.

    With multivariate statistics, the assumption is thatthe combination of variables follows a multivariate

    normal distribution. Since there is not a direct test for multivariate

    normality, we generally test each variableindividually and assume that they are multivariatenormal if they are individually normal, though this is

    not necessarily the case.

  • 8/12/2019 Assumptions Summer2003

    4/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 4

    Assumption of Normality:Evaluating Normality

    There are both graphical and statistical methods forevaluating normality.

    Graphical methods include the histogram andnormality plot.

    Statistical methods include diagnostic hypothesistests for normality, and a rule of thumb that saysa variable is reasonably close to normal if itsskewness and kurtosis have values between 1.0

    and +1.0. None of the methods is absolutely definitive.

    We will use the criteria that the skewness andkurtosis of the distribution both fall between -1.0and +1.0.

  • 8/12/2019 Assumptions Summer2003

    5/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 5

    Assumption of Normality:Histograms and Normality Plots

    RS OCCUPATI ONAL PRESTIG E SCORE (1980)

    85.0

    80.0

    75.0

    70.0

    65.0

    60.0

    55.0

    50.0

    45.0

    40.0

    35.0

    30.0

    25.0

    20.0

    15.0

    Histogram50

    40

    30

    20

    10

    0

    Std. Dev = 13.94

    Mean = 44.2

    N = 255.00

    TIME SPENT USING E-MAIL

    40.035.030.025.020.015.010.05.00.0

    Histogram100

    80

    60

    40

    20

    0

    Std. Dev = 6.14

    Mean = 3.6

    N = 119.00

    Normal Q-Q Plot of TIME SPENT USING E-MAIL

    Observed Value

    50403020100-10

    3

    2

    1

    0

    -1

    -2

    Normal Q-Q Plot of RS OCCUPATIONAL PREST

    Observed Value

    100806040200

    3

    2

    1

    0

    -1

    -2

    -3

    On the left side of the slide is the histogram and normality plot

    for a occupational prestige that could reasonably becharacterized as normal. Time using email, on the right, is notnormally distributed.

  • 8/12/2019 Assumptions Summer2003

    6/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 6

    Assumption of Normality:Hypothesis test of normality

    Tests of Normality

    .121 255 .000 .964 255 .000

    RS OCCUPATIONAL

    PRESTIGE SCORE(1980)

    Statistic df Sig. Statistic df Sig.

    Kolmogorov-Smirnova

    Shapiro-Wilk

    Lilliefors Significance Correctiona.

    Tests of Normality

    .296 119 .000 .601 119 .000TIME SPENT

    USING E-MAIL

    Statistic df Sig. Statistic df Sig.

    Kolmogorov-Smirnova

    Shapiro-Wilk

    Lilli efors Significance Correctiona.

    The hypothesis test for normality tests the null hypothesis that thevariable is normal, i.e. the actual distribution of the variable fits thepattern we would expect if it is normal. If we fail to reject the nullhypothesis, we conclude that the distribution is normal.

    The distribution for both of the variable depicted on the previous slide areassociated with low significance values that lead to rejecting the null

    hypothesis and concluding that neither occupational prestige nor timeusing email is normally distributed.

  • 8/12/2019 Assumptions Summer2003

    7/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 7

    Assumption of Normality:Skewness, kurtosis, and normality

    Using the rule of thumb that a rule of thumb that says a variable isreasonably close to normal if its skewness and kurtosis have valuesbetween 1.0 and +1.0, we would decide that occupationalprestige is normally distributed and time using email is not.

    We will use this rule of thumb for normality in our strategy forsolving problems.

  • 8/12/2019 Assumptions Summer2003

    8/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 8

    Assumption of Normality:Transformations

    When a variable is not normally distributed, we cancreate a transformed variable and test it fornormality. If the transformed variable is normallydistributed, we can substitute it in our analysis.

    Three common transformations are: the logarithmictransformation, the square root transformation, andthe inverse transformation.

    All of these change the measuring scale on thehorizontal axis of a histogram to produce atransformed variable that is mathematicallyequivalent to the original variable.

  • 8/12/2019 Assumptions Summer2003

    9/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 9

    Assumption of Normality:Computing Transformations

    We will use SPSS scripts as described below to testassumptions and compute transformations.

    For additional details on the mechanics of computingtransformations, see Computing Transformations

  • 8/12/2019 Assumptions Summer2003

    10/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 10

    Assumption of Normality:When transformations do not work

    When none of the transformations induces normality

    in a variable, including that variable in the analysis

    will reduce our effectiveness at identifying statistical

    relationships, i.e. we lose power.

    We do have the option of changing the way the

    information in the variable is represented, e.g.

    substitute several dichotomous variables for a single

    metric variable.

  • 8/12/2019 Assumptions Summer2003

    11/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 11

    Assumption of Normality:Computing Explore descriptive statistics

    To compute the statisticsneeded for evaluating thenormality of a variable, selectthe Explore command from

    the Descriptive Statisticsmenu.

  • 8/12/2019 Assumptions Summer2003

    12/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 12

    Assumption of Normality:Adding the variable to be evaluated

    First, click on thevariable to be includedin the analysis tohighlight it.

    Second, click on rightarrow button to movethe highlighted variableto the Dependent List.

  • 8/12/2019 Assumptions Summer2003

    13/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 13

    Assumption of Normality:Selecting statistics to be computed

    To select the statistics for theoutput, click on theStatisticscommand button.

  • 8/12/2019 Assumptions Summer2003

    14/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 14

    Assumption of Normality:Including descriptive statistics

    First, click on theDescriptivescheckboxto select it. Clear theother checkboxes.

    Second, click on theContinuebutton tocomplete the request forstatistics.

  • 8/12/2019 Assumptions Summer2003

    15/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 15

    Assumption of Normality:Selecting charts for the output

    To select the diagnostic chartsfor the output, click on thePlotscommand button.

  • 8/12/2019 Assumptions Summer2003

    16/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 16

    Assumption of Normality:Including diagnostic plots and statistics

    First, click on theNoneoption buttonon the Boxplots panelsince boxplots are notas helpful as othercharts in assessingnormality.

    Second, click on theNormality plots with testscheckbox to includenormality plots and thehypothesis tests fornormality.

    Third, click on the Histogramcheckbox to include ahistogram in the output. Youmay want to examine thestem-and-leaf plot as well,though I find it less useful.

    Finally, click on theContinuebutton tocomplete the request.

  • 8/12/2019 Assumptions Summer2003

    17/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 17

    Assumption of Normality:Completing the specifications for the analysis

    Click on the OK button tocomplete the specificationsfor the analysis and requestSPSS to produce theoutput.

  • 8/12/2019 Assumptions Summer2003

    18/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 18

    TOTAL TIME SPENT ON THE INTERNET

    100.0

    90.0

    80.0

    70.0

    60.0

    50.0

    40.0

    30.0

    20.0

    10.0

    0.0

    Histogram50

    40

    30

    20

    10

    0

    Std. Dev = 15.35

    Mean = 10.7

    N = 93.00

    Assumption of Normality:The histogram

    An initial impression of thenormality of the distributioncan be gained by examiningthe histogram.

    In this example, the

    histogram shows a substantialviolation of normality causedby a extremely large value inthe distribution.

  • 8/12/2019 Assumptions Summer2003

    19/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 19

    Normal Q-Q Plot of TOTAL TIME SPENT ON TH

    Observed Value

    120100806040200-20-40

    3

    2

    1

    0

    -1

    -2

    -3

    Assumption of Normality:The normality plot

    The problem with the normality of thisvariables distribution is reinforced by thenormality plot.

    If the variable were normally distributed,

    the red dots would fit the green line veryclosely. In this case, the red points in theupper right of the chart indicate thesevere skewing caused by the extremelylarge data values.

  • 8/12/2019 Assumptions Summer2003

    20/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 20

    Tests of Normality

    .246 93 .000 .606 93 .000TOTAL TIME SPENT

    ON THE INTERNET

    Statistic df Sig. Statistic df Sig.

    Kolmogorov-Smirnova

    Shapiro-Wilk

    Lilli efors Significance Correctiona.

    Assumption of Normality:The test of normality

    Since the sample size is larger than 50, we use the Kolmogorov-Smirnovtest. If the sample size were 50 or less, we would use the Shapiro-Wilkstatistic instead.

    The null hypothesis for the test of normality states that the actualdistribution of the variable is equal to the expected distribution, i.e., thevariable is normally distributed. Since the probability associated with thetest of normality is < 0.001 is less than or equal to the level of significance

    (0.01), we reject the null hypothesis and conclude that total hours spent onthe Internet is not normally distributed. (Note: we report the probability as

  • 8/12/2019 Assumptions Summer2003

    21/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 21

    Descriptives

    10.7312 1.59183

    7.5697

    13.8927

    8.2949

    5.5000235.655

    15.35106

    .20

    102.00

    101.80

    10.2000

    3.532 .250

    15.614 .495

    Mean

    Lower Bound

    Upper Bound

    95% Confidence

    Interval for Mean

    5% Trimmed Mean

    MedianVariance

    Std. Deviation

    Minimum

    Maximum

    Range

    Interquartile Range

    Skewness

    Kurtosis

    TOTAL TIME SPEN

    ON THE INTERNET

    Statistic Std. Error

    Assumption of Normality:The rule of thumb for skewness and kurtosis

    Using the rule of thumb for evaluating normality with the skewnessand kurtosis statistics, we look at the table of descriptive statistics.

    The skewness and kurtosis for the variable both exceed the rule ofthumb criteria of 1.0. The variable is not normally distributed.

  • 8/12/2019 Assumptions Summer2003

    22/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 22

    Assumption of Linearity

    Linearity means that the amount of change, or rateof change, between scores on two variables isconstant for the entire range of scores for thevariables.

    Linearity characterizes the relationship between twometric variables. It is tested for the pairs formed bydependent variable and each metric independentvariable in the analysis.

    There are relationships that are not linear. The relationship between learning and time may not be

    linear. Learning a new subject shows rapid gains at first,then the pace slows down over time. This is oftenreferred to a a learning curve.

    Population growth may not be linear. The pattern oftenshows growth at increasing rates over time.

  • 8/12/2019 Assumptions Summer2003

    23/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 23

    Assumption of Linearity:Evaluating linearity

    There are both graphical and statistical methods forevaluating linearity.

    Graphical methods include the examination of

    scatterplots, often overlaid with a trendline. Whilecommonly recommended, this strategy is difficult toimplement.

    Statistical methods include diagnostic hypothesistests for linearity, a rule of thumb that says arelationship is linear if the difference between thelinear correlation coefficient (r) and the nonlinearcorrelation coefficient (eta) is small, and examining

    patterns of correlation coefficients.

  • 8/12/2019 Assumptions Summer2003

    24/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 24

    RESPONDENT'S SOCIOECONOMIC INDEX

    100806040200

    90

    80

    70

    60

    50

    40

    30

    20

    10

    Assumption of Linearity:Interpreting scatterplots

    The advice for interpretinglinearity is often phrased aslooking for a cigar-shapedband, which is very evident inthis plot.

  • 8/12/2019 Assumptions Summer2003

    25/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 25

    Gross domestic product / capita

    3000020000100000-10000

    200

    100

    0

    -100

    Assumption of Linearity:Interpreting scatterplots

    Sometimes, a scatterplotshows a clearly nonlinearpattern that requirestransformation, like the oneshown in the scatterplot.

  • 8/12/2019 Assumptions Summer2003

    26/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 26

    Assumption of Linearity:Scatterplots that are difficult to interpret

    AGE OF RESPONDENT

    8070605040302010

    120

    100

    80

    60

    40

    20

    0

    -20

    HOURS PER DAY WATCHING TV

    1614121086420-2

    120

    100

    80

    60

    40

    20

    0

    -20

    The correlations for both of theserelationships are low.

    The linearity of the relationship on the rightcan be improved with a transformation; theplot on the left cannot. However, this is notnecessarily obvious from the scatterplots.

  • 8/12/2019 Assumptions Summer2003

    27/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 27

    Correlations

    1 .017 .048 .032 .079

    . .874 .648 .761 .453

    93 93 93 93 93

    .017 1 .979** .995** .916**

    .874 . .000 .000 .000

    93 270 270 270 270

    .048 .979** 1 .994** .978**

    .648 .000 . .000 .000

    93 270 270 270 270

    .032 .995** .994** 1 .951**

    .761 .000 .000 . .000

    93 270 270 270 270

    .079 .916** .978** .951** 1

    .453 .000 .000 .000 .

    93 270 270 270 270

    Pearson Correlation

    Sig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    TOTAL TIME SPENT ON

    THE INTERNET

    AGE OF RESPONDENT

    Logarithm of AGE

    [LG10(AGE)]

    Square Root of AGE

    [SQRT(AGE)]

    Inverse of AGE [-1/(AGE)]

    TOTAL TIME

    SPENT ON

    THE

    INTERNET

    AGE OF

    RESPON

    DENT

    Logarithm of

    AGE

    [LG10(AGE)]

    Square Root

    of AGE

    [SQRT(AGE)]

    Inverse of

    AGE

    [-1/(AGE)]

    Correlation is significant at the 0.01 level (2-tailed).**.

    Assumption of Linearity:Using correlation matrices

    Creating a correlation matrixfor the dependent variableand the original andtransformed variations of theindependent variable providesus with a pattern that iseasier to interpret.

    The information that we needis in the first column of thematrix which shows thecorrelation and significancefor the dependent variableand all forms of theindependent variable.

  • 8/12/2019 Assumptions Summer2003

    28/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 28

    Correlations

    1 .017 .048 .032 .079

    . .874 .648 .761 .453

    93 93 93 93 93

    .017 1 .979** .995** .916**

    .874 . .000 .000 .000

    93 270 270 270 270

    .048 .979** 1 .994** .978**

    .648 .000 . .000 .000

    93 270 270 270 270

    .032 .995** .994** 1 .951**

    .761 .000 .000 . .000

    93 270 270 270 270

    .079 .916** .978** .951** 1

    .453 .000 .000 .000 .

    93 270 270 270 270

    Pearson Correlation

    Sig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    TOTAL TIME SPENT ON

    THE INTERNET

    AGE OF RESPONDENT

    Logarithm of AGE

    [LG10(AGE)]

    Square Root of AGE

    [SQRT(AGE)]

    Inverse of AGE [-1/(AGE)]

    TOTAL TIME

    SPENT ON

    THE

    INTERNET

    AGE OF

    RESPON

    DENT

    Logarithm of

    AGE

    [LG10(AGE)]

    Square Root

    of AGE

    [SQRT(AGE)]

    Inverse of

    AGE

    [-1/(AGE)]

    Correlation is significant at the 0.01 level (2-tailed).**.

    Assumption of Linearity:The pattern of correlations for no relationship

    The correlation between thetwo variables is very weakand statistically non-significant. If we viewed thisas a hypothesis test for thesignificance of r, we wouldconclude that there is norelationship between thesevariables.

    Moreover, none of significancetests for the correlations withthe transformed dependentvariable are statisticallysignificant. There is norelationship between thesevariables; it is not a problemwith non-linearity.

  • 8/12/2019 Assumptions Summer2003

    29/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 29

    Correlations

    1 .021 .258** .066 .198**

    . .762 .000 .331 .003

    219 219 219 219 219

    .021 1 .282** .897** .038

    .762 . .000 .000 .572

    219 219 219 219 219

    .258** .282** 1 .600** .606**

    .000 .000 . .000 .000

    219 219 219 219 219

    .066 .897** .600** 1 .164*

    .331 .000 .000 . .015

    219 219 219 219 219

    .198** .038 .606** .164* 1

    .003 .572 .000 .015 .

    219 219 219 219 219

    Pearson Correlation

    Sig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    Infant mortali ty rate

    population

    Logarithm of POP

    [LG10(POP)]

    Square Root of POP

    [SQRT(POP)]

    Inverse of POP [-1/(POP)]

    Infant

    mortality rate population

    Logarithm of

    POP

    [LG10(POP)]

    Square Root

    of POP

    [SQRT(POP)]

    Inverse of

    POP

    [-1/(POP)]

    Correlation is significant at the 0.01 level (2-tailed).**.

    Correlation is significant at the 0.05 level (2-tailed).*.

    Assumption of Linearity:Correlation pattern suggesting transformation

    The correlation between thetwo variables is very weakand statistically non-significant. If we viewed thisas a hypothesis test for thesignificance of r, we wouldconclude that there is norelationship between thesevariables.

    However, the probabilityassociated with the largercorrelation for the logarithmictransformation is statisticallysignificant, suggesting thatthis is a transformation wemight want to use in ouranalysis.

  • 8/12/2019 Assumptions Summer2003

    30/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 30

    Assumption of Linearity:Correlation pattern suggesting substitution

    Should it happen that the correlation between a

    transformed independent variable and the

    dependent variable is substantially stronger than the

    relationship between the untransformed independent

    variable and the dependent variable, thetransformation should be considered even if the

    relationship involving the untransformed

    independent variable is statistically significant.

    A difference of +0.20 or -0.20, or more, would be

    considered substantial enough since a change of this

    size would alter our interpretation of the

    relationship.

  • 8/12/2019 Assumptions Summer2003

    31/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 31

    Assumption of Linearity:Transformations

    When a relationship is not linear, we can transformone or both variables to achieve a relationship that is

    linear.

    Three common transformations to induce linearityare: the logarithmic transformation, the square root

    transformation, and the inverse transformation.

    All of these transformations produce a new variablethat is mathematically equivalent to the original

    variable, but expressed in different measurement

    units, e.g. logarithmic units instead of decimal units.

  • 8/12/2019 Assumptions Summer2003

    32/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 32

    Assumption of Linearity:When transformations do not work

    When none of the transformations induces linearity

    in a relationship, our statistical analysis will

    underestimate the presence and strength of the

    relationship, i.e. we lose power.

    We do have the option of changing the way the

    information in the variables are represented, e.g.

    substitute several dichotomous variables for a single

    metric variable. This bypasses the assumption oflinearity while still attempting to incorporate the

    information about the relationship in the analysis.

  • 8/12/2019 Assumptions Summer2003

    33/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 33

    Assumption of Linearity:Creating the scatterplot

    Suppose we are interested inthe linearity of therelationship between "hoursper day watching TV" and"total hours spent on theInternet".

    The most commonlyrecommended strategy forevaluating linearity is visualexamination of a scatter plot.

    To obtain a scatter plotin SPSS, select theScatter command fromthe Graphsmenu.

  • 8/12/2019 Assumptions Summer2003

    34/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 34

    Assumption of Linearity:Selecting the type of scatterplot

    First, click onthumbnail sketch of asimple scatterplot tohighlight it.

    Second, click onthe Define button tospecify the variablesto be included in thescatterplot.

  • 8/12/2019 Assumptions Summer2003

    35/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 35

    Assumption of Linearity:Selecting the variables

    First, move thedependent variablenetimeto the Y

    Axistext box.

    Second, move theindependentvariable tvhourstotheX axistextbox.

    If a problem statement mentions arelationship between two variableswithout clearly indicating which isthe independent variable and whichis the dependent variable, the firstmentioned variable is taken to thebe independent variable.

    Third, click onthe OKbutton tocomplete thespecifications forthe scatterplot.

  • 8/12/2019 Assumptions Summer2003

    36/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 36

    Assumption of Linearity:The scatterplot

    The scatterplot is produced in

    the SPSS output viewer.

    The points in a scatterplot areconsidered linear if they forma cigar-shaped elliptical band.

    The pattern in this scatterplotis not really clear.

  • 8/12/2019 Assumptions Summer2003

    37/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 37

    Assumption of Linearity:Adding a trendline

    To try to determine if the relationship is linear,we can add a trendline to the chart.

    To add a trendlineto the chart, weneed to open thechart for editing.

    To open the chartfor editing, doubleclick on it.

  • 8/12/2019 Assumptions Summer2003

    38/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 38

    Assumption of Linearity:The scatterplot in the SPSS Chart Editor

    The chart that wedouble clicked on is

    opened for editing in theSPSS Chart Editor.

    To add the trendline, select the

    Options commandfrom the Chartmenu.

  • 8/12/2019 Assumptions Summer2003

    39/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 39

    Assumption of Linearity:Requesting the fit line

    In the Scatterplot Optionsdialog box, we click on theTotalcheckbox in the Fit Linepanel in order to request thetrend line.

    Click on the Fit Optionsbutton to request the rcoefficient of determinationas a measure of thestrength of therelationship.

  • 8/12/2019 Assumptions Summer2003

    40/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 40

    Assumption of Linearity:Requesting r

    First, the Linearregressionthumbnailsketch should behighlighted as the typeof fit line to be added tothe chart.

    Second, click on the FitOptions Click on the DisplayR-square in Legendcheckboxto add this item to ouroutput.

    Third, click on theContinuebutton tocomplete theoptions request.

  • 8/12/2019 Assumptions Summer2003

    41/119

  • 8/12/2019 Assumptions Summer2003

    42/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 42

    Assumption of Linearity:The fit line and r

    The red fit line isadded to the chart.

    The value of r(0.0460)suggests thatthe relationship

    is weak.

  • 8/12/2019 Assumptions Summer2003

    43/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 43

    Assumption of Linearity:Computing the transformations

    There are fourtransformations that wecan use to achieve orimprove linearity.

    The compute dialogs forthese fourtransformations for

    linearity are shown.

  • 8/12/2019 Assumptions Summer2003

    44/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 44

    Assumption of Linearity:Creating the scatterplot matrix

    To create the scatterplotmatrix, select theScatter command inthe Graphsmenu.

  • 8/12/2019 Assumptions Summer2003

    45/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 45

    Assumption of Linearity:Selecting type of scatterplot

    First, click on theMatrixthumbnailsketch to indicatewhich type ofscatterplot we want.

    Second, click on theDefinebutton to selectthe variables for thescatterplot.

  • 8/12/2019 Assumptions Summer2003

    46/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 46

    Assumption of Linearity:Specifications for scatterplot matrix

    First, move the dependentvariable, the independent variableand all of the transformations tothe Matrix Variableslist box.

    Second, clickon the OKbutton toproduce thescatterplot.

  • 8/12/2019 Assumptions Summer2003

    47/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 47

    TOTAL TIME SPENT ON

    HOURS PER DAY WATCHI

    LGTVHOUR

    SQTVHOUR

    INTVHOUR

    S2TVHOUR

    Assumption of Linearity:The scatterplot matrix

    The scatterplot matrix shows athumbnail sketch of scatterplotsfor each independent variable ortransformation with thedependent variable. Thescatterplot matrix may suggest

    which transformations might beuseful.

  • 8/12/2019 Assumptions Summer2003

    48/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 48

    Assumption of Linearity:Creating the correlation matrix

    To create the correlationmatrix, select theCorrelate | Bivariatecommand in theAnalyze

    menu.

  • 8/12/2019 Assumptions Summer2003

    49/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 49

    Assumption of Linearity:Specifications for correlation matrix

    First, move the dependentvariable, the independent variableand all of the transformations tothe Variableslist box.

    Second, click onthe OKbutton toproduce thecorrelation matrix.

  • 8/12/2019 Assumptions Summer2003

    50/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 50

    Correlations

    1 .215 .104 .156 .045 .328**

    . .079 .397 .203 .713 .006

    93 68 68 68 68 68.215 1 .874** .967** .626** .903**

    .079 . .000 .000 .000 .000

    68 160 160 160 160 160

    .104 .874** 1 .967** .910** .611**

    .397 .000 . .000 .000 .000

    68 160 160 160 160 160

    .156 .967** .967** 1 .784** .774**

    .203 .000 .000 . .000 .000

    68 160 160 160 160 160

    .045 .626** .910** .784** 1 .335**

    .713 .000 .000 .000 . .000

    68 160 160 160 160 160

    .328** .903** .611** .774** .335** 1

    .006 .000 .000 .000 .000 .

    68 160 160 160 160 160

    Pearson Correlat ion

    Sig. (2-tail ed)

    NPearson Correlat ion

    Sig. (2-tail ed)

    N

    Pearson Correlat ion

    Sig. (2-tail ed)

    N

    Pearson Correlat ion

    Sig. (2-tail ed)

    N

    Pearson Correlat ion

    Sig. (2-tail ed)

    N

    Pearson Correlat ion

    Sig. (2-tail ed)

    N

    TOTAL TIME SPENT

    ON THE INTERNET

    HOURS PER DAY

    WATCHING TV

    LGTVHOUR

    SQTVHOUR

    INTVHOUR

    S2TVHOUR

    TOTAL TIME

    SPENT ON

    THE

    INTERNET

    HOURS PER

    DAY

    WATCHING

    TV LGTVHOUR SQTVHOUR INTVHOUR S2TVHOUR

    Correlation is significant at the 0.01 level (2-tailed).**.

    Assumption of Linearity:The correlation matrix

    The answers to the problemsare based on the correlationmatrix.

    Before we answer thequestion in this problem, wewill use a script to produce

    the output.

  • 8/12/2019 Assumptions Summer2003

    51/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 51

    Assumption of Homoscedasticity

    Homoscedasticity refers to the assumption that thedependent variable exhibits similar amounts of

    variance across the range of values for an

    independent variable.

    While it applies to independent variables at all three

    measurement levels, the methods that we will use to

    evaluation homoscedasticity requires that the

    independent variable be non-metric (nominal orordinal) and the dependent variable be metric

    (ordinal or interval). When both variables are

    metric, the assumption is evaluated as part of the

    residual analysis in multiple regression.

    H d ti it

  • 8/12/2019 Assumptions Summer2003

    52/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 52

    Assumption of Homoscedasticity:Evaluating homoscedasticity

    Homoscedasticity is evaluated for pairs of variables.

    There are both graphical and statistical methods forevaluating homoscedasticity .

    The graphical method is called a boxplot.

    The statistical method is the Levene statistic which

    SPSS computes for the test of homogeneity ofvariances.

    Neither of the methods is absolutely definitive.

    f H d ti it

  • 8/12/2019 Assumptions Summer2003

    53/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 53

    56114220138N =

    MARITAL STATUS

    NEVER MARRIED

    SEPARATED

    DIVORCED

    WIDOWED

    MARRIED

    5

    4

    3

    2

    1

    0

    -1

    91829105132142256

    23421711281696640

    2361976863

    588789214243

    203134

    18117116310090

    262141

    78

    Assumption of Homoscedasticity:The boxplot

    Each red box shows the middle50% of the cases for the group,indicating how spread out thegroup of scores is.

    If the variance acrossthe groups is equal, theheight of the red boxeswill be similar across thegroups.

    If the heights of the redboxes are different, theplot suggests that thevariance across groupsis not homogeneous.

    The married group ismore spread out thanthe other groups,suggesting unequalvariance.

    S 388R7 A i f H d ti it

  • 8/12/2019 Assumptions Summer2003

    54/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 54

    Test of Homogeneity of Variances

    RS HIGHEST DEGREE

    5.239 4 262 .000

    Levene

    Statistic df1 df2 Sig.

    Assumption of Homoscedasticity:Levene test of the homogeneity of variance

    The null hypothesis for the test of homogeneity ofvariance states that the variance of the dependentvariable is equal across groups defined by theindependent variable, i.e., the variance is homogeneous.

    Since the probability associated with the Levene Statistic(

  • 8/12/2019 Assumptions Summer2003

    55/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 55

    Assumption of Homoscedasticity:Transformations

    When the assumption of homoscedasticity is notsupported, we can transform the dependent variablevariable and test it for homoscedasticity . If thetransformed variable demonstrateshomoscedasticity, we can substitute it in our

    analysis.

    We use the sample three common transformationsthat we used for normality: the logarithmictransformation, the square root transformation, and

    the inverse transformation.

    All of these change the measuring scale on thehorizontal axis of a histogram to produce atransformed variable that is mathematically

    equivalent to the original variable.

    SW388R7 A ti f H d ti it

  • 8/12/2019 Assumptions Summer2003

    56/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 56

    Assumption of Homoscedasticity:When transformations do not work

    When none of the transformations results in

    homoscedasticity for the variables in the

    relationship, including that variable in the analysis

    will reduce our effectiveness at identifying statistical

    relationships, i.e. we lose power.

    SW388R7 A ti f H d ti it

  • 8/12/2019 Assumptions Summer2003

    57/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 57

    Assumption of Homoscedasticity:Request a boxplot

    The boxplot provides a visualimage of the distribution of the

    dependent variable for thegroups defined by theindependent variable.

    To request a boxplot, choosethe BoxPlotcommand fromthe Graphsmenu.

    Suppose we want totest for homogeneity ofvariance: whether thevariance in "highestacademic degree" ishomogeneous for thecategories of "maritalstatus."

    SW388R7 A ti f Homoscedasticit

  • 8/12/2019 Assumptions Summer2003

    58/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 58

    Assumption of Homoscedasticity:Specify the type of boxplot

    First, click on the Simplestyle of boxplot to highlightit with a rectangle aroundthe thumbnail drawing.

    Second, click on the Definebutton to specify thevariables to be plotted.

    SW388R7 A ti f Homoscedasticity

  • 8/12/2019 Assumptions Summer2003

    59/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 59

    Assumption of Homoscedasticity:Specify the dependent variable

    First, click on thedependent variableto highlight it.

    Second, click on the rightarrow button to move thedependent variable to theVariabletext box.

    SW388R7 A ti f Homoscedasticity

  • 8/12/2019 Assumptions Summer2003

    60/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 60

    Assumption of Homoscedasticity:Specify the independent variable

    First, click on theindependentvariable to highlightit.

    Second, click on the rightarrow button to move theindependent variable to the

    Category Axistext box.

    SW388R7 A ti f Homoscedasticity

  • 8/12/2019 Assumptions Summer2003

    61/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 61

    Assumption of Homoscedasticity:Complete the request for the boxplot

    To complete therequest for theboxplot, click onthe OK button.

  • 8/12/2019 Assumptions Summer2003

    62/119

    SW388R7 Assumption of Homoscedasticity :

  • 8/12/2019 Assumptions Summer2003

    63/119

    SW388R7

    Data Analysis &

    Computers II

    Slide 63

    Assumption of Homoscedasticity:Request the test for homogeneity of variance

    To compute the Levene test for

    homogeneity of variance,select the Compare Means |One-Way ANOVAcommandfrom theAnalyzemenu.

  • 8/12/2019 Assumptions Summer2003

    64/119

    SW388R7 Assumption of Homoscedasticity :

  • 8/12/2019 Assumptions Summer2003

    65/119

    Data Analysis &

    Computers II

    Slide 65

    Assumption of Homoscedasticity:Specify the dependent variable

    First, click on thedependent variableto highlight it.

    Second, click on the rightarrow button to move thedependent variable to theDependent Listtext box.

    SW388R7 Assumption of Homoscedasticity :

  • 8/12/2019 Assumptions Summer2003

    66/119

    Data Analysis &

    Computers II

    Slide 66

    Assumption of Homoscedasticity:The homogeneity of variance test is an option

    Click on the Optionsbutton to open the optionsdialog box.

  • 8/12/2019 Assumptions Summer2003

    67/119

    SW388R7 Assumption of Homoscedasticity :

  • 8/12/2019 Assumptions Summer2003

    68/119

    Data Analysis &

    Computers II

    Slide 68

    Assumption of Homoscedasticity:Complete the request for output

    Click on the OK button tocomplete the request forthe homogeneity ofvariance test through theone-way anova procedure.

    SW388R7 Assumption of Homoscedasticity :

  • 8/12/2019 Assumptions Summer2003

    69/119

    Data Analysis &

    Computers II

    Slide 69

    Test of Homogeneity of Variances

    RS HIGHEST DEGREE

    5.239 4 262 .000

    Levene

    Statistic df1 df2 Sig.

    Assumption of Homoscedasticity:Interpreting the homogeneity of variance test

    The null hypothesis for the test of homogeneity ofvariance states that the variance of the dependentvariable is equal across groups defined by theindependent variable, i.e., the variance is homogeneous.

    Since the probability associated with the Levene Statistic(

  • 8/12/2019 Assumptions Summer2003

    70/119

    Data Analysis &

    Computers II

    Slide 70

    Using scripts

    The process of evaluating assumptions requires

    numerous SPSS procedures and outputs that are time

    consuming to produce.

    These procedures can be automated by creating an

    SPSS script. A script is a program that executes a

    sequence of SPSS commands.

    Though writing scripts is not part of this course, we

    can take advantage of scripts that I use to reduce

    the burdensome tasks of evaluating assumptions .

    SW388R7

  • 8/12/2019 Assumptions Summer2003

    71/119

    Data Analysis &

    Computers II

    Slide 71

    Using a script for evaluating assumptions

    The script EvaluatingAssumptionsAndMissingData.exe

    will produce all of the output we have used for

    evaluating assumptions.

    Navigate to the link SPSS Scripts and Syntax on the

    course web page.

    Download the script file EvaluatingAssumptionsAnd

    MissingData.exe to your computer and install it,

    following the directions on the web page.

  • 8/12/2019 Assumptions Summer2003

    72/119

    SW388R7

    D t A l i &

    I k h i i SPSS

  • 8/12/2019 Assumptions Summer2003

    73/119

    Data Analysis &

    Computers II

    Slide 73

    Invoke the script in SPSS

    To invoke the script, selectthe Run Script commandin the Utilitiesmenu.

    SW388R7

    D t A l i &

    S l h i

  • 8/12/2019 Assumptions Summer2003

    74/119

    Data Analysis &

    Computers II

    Slide 74

    Select the script

    First, navigate to the folder where you put the script.If you followed the directions, you will have a file withan ".SBS" extension in the C:\SW388R7 folder.

    If you only see a file with an .EXE extension in thefolder, you should double click on that file to extractthe script file to the C:\SW388R7 folder.

    Third, click onRunbutton tostart the script.

    Second, click on thescript name to highlight

    it.

    SW388R7

    Data Analysis &

    Th i t di l

  • 8/12/2019 Assumptions Summer2003

    75/119

    Data Analysis &

    Computers II

    Slide 75

    The script dialog

    The script dialog box acts

    similarly to SPSS dialogboxes. You select thevariables to include in theanalysis and choose optionsfor the output.

    SW388R7

    Data Analysis &

    C l t th ifi ti 1

  • 8/12/2019 Assumptions Summer2003

    76/119

    Data Analysis &

    Computers II

    Slide 76

    Complete the specifications - 1

    Move the the dependent and

    independent variables from the list ofvariables to the list boxes. Metricand nonmetric variables are movedto separate lists so the computerknows how you want them treated.

    You must also indicate the levelof measurement for thedependent variable. By defaultthe metric option button is

    marked.

    SW388R7

    Data Analysis &

    C l t th ifi ti 2

  • 8/12/2019 Assumptions Summer2003

    77/119

    Data Analysis &

    Computers II

    Slide 77

    Complete the specifications - 2

    Mark the optionbutton for the typeof output you wantthe script tocompute.

    Click on the OKbutton to producethe output.

    Select thetransformations to be tested.

    SW388R7

    Data Analysis &

    Th i t fi i h

  • 8/12/2019 Assumptions Summer2003

    78/119

    Data Analysis &

    Computers II

    Slide 78

    The script finishes

    If your SPSS output viewer isopen, you will see the outputproduced in that window.

    Since it may take a while toproduce the output, andsince there are times whenit appears that nothing ishappening, there is an alertto tell you when the script isfinished.

    Unless you are absolutely

    sure something has gonewrong, let the script rununtil you see this alert.

    When you see this alert,click on the OK button.

    SW388R7

    Data Analysis &

    O t t f th i t 1

  • 8/12/2019 Assumptions Summer2003

    79/119

    Data Analysis &

    Computers II

    Slide 79

    Output from the script - 1

    The script will produce lotsof output. Additionaldescriptive material in the

    titles should help linkspecific outputs to specifictasks.

    Scroll through the script tolocate the outputs neededto answer the question.

    SW388R7

    Data Analysis &

    Cl i g th i t di l g b

  • 8/12/2019 Assumptions Summer2003

    80/119

    Data Analysis &

    Computers II

    Slide 80

    Closing the script dialog box

    The script dialog box doesnot close automaticallybecause we often want torun another test right away.There are two methods forclosing the dialog box.

    Click on the Cancelbutton to close thescript.

    Click on theXclose box to closethe script.

    SW388R7

    Data Analysis &

    Problem 1

  • 8/12/2019 Assumptions Summer2003

    81/119

    Data Analysis &

    Computers II

    Slide 81

    Problem 1

    In the dataset GSS2000R, is the following statement true, false, or an

    incorrect application of a statistic? Use a level of significance of 0.01

    for evaluating missing data and assumptions.

    In pre-screening the data for use in a multiple regression of the

    dependent variable "total hours spent on the Internet" [netime] withthe independent variables "age" [age], "sex" [sex], and "income"

    [rincom98], the evaluation of the assumptions of normality, linearity,

    and homogeneity of variance did not indicate any need for a caution to

    be added to the interpretation of the analysis.

    1. True

    2. True with caution

    3. False

    4. Inappropriate application of a statistic

    SW388R7

    Data Analysis &

    Level of measurement

  • 8/12/2019 Assumptions Summer2003

    82/119

    Data Analysis &

    Computers II

    Slide 82

    Level of measurement

    9. In the dataset GSS2000R, is the following statement true, false, oran incorrect application of a statistic? Use a level of significance of

    0.01 for evaluating missing data and assumptions.

    In pre-screening the data for use in a multiple regression of the

    dependent variable "total hours spent on the Internet" [netime] withthe independent variables "age" [age], "sex" [sex], and "income"

    [rincom98], the evaluation of the assumptions of normality, linearity,

    and homogeneity of variance did not indicate any need for a caution to

    be added to the interpretation of the analysis.

    Since we are pre-screening

    for a multiple regressionproblem, we should makesure we satisfy the level ofmeasurement beforeproceeding.

    "Total hours spent on the Internet"[netime] is interval, satisfying the metriclevel of measurement requirement forthe dependent variable.

    "Age" [age] and "highest year of school completed" [educ] are interval,satisfying the metric or dichotomous level of measurement requirement for

    independent variables.

    "Sex" [sex] is dichotomous, satisfying the metric or dichotomous level ofmeasurement requirement for independent variables.

    "Income" [rincom98] is ordinal, satisfying the metric or dichotomous level ofmeasurement requirement for independent variables. Since some dataanalysts do not agree with this convention of treating an ordinal variable asmetric, a note of caution should be included in our interpretation.

  • 8/12/2019 Assumptions Summer2003

    83/119

    SW388R7

    Data Analysis &

    Run the script to test normality 2

  • 8/12/2019 Assumptions Summer2003

    84/119

    y

    Computers II

    Slide 84

    Run the script to test normality - 2

    First, navigate to theSW388R7 folder on yourcomputer.

    Third, click onthe Runbutton toopen the script.

    Second, click on the script name to select it:EvaluatingAssumptionsAndMissingData.SBS

    SW388R7

    Data Analysis &

    Run the script to test normality 3

  • 8/12/2019 Assumptions Summer2003

    85/119

    y

    Computers II

    Slide 85

    Run the script to test normality - 3

    First, move the variables to the

    list boxes based on the role thatthe variable plays in the analysisand its level of measurement.

    Third, mark the checkboxesfor the transformations thatwe want to test in evaluatingthe assumption.

    Second, click on the Normalityoptionbutton to request that SPSS producethe output needed to evaluate theassumption of normality.

    Fourth, click onthe OK button toproduce the output.

    SW388R7

    Data Analysis &

    Normality of the dependent variable

  • 8/12/2019 Assumptions Summer2003

    86/119

    Computers II

    Slide 86

    Descriptives

    10.7312 1.59183

    7.5697

    13.8927

    8.2949

    5.5000

    235.655

    15.35106

    .20

    102.00

    101.80

    10.2000

    3.532 .250

    15.614 .495

    Mean

    Lower Bound

    Upper Bound

    95% Confidence

    Interval for Mean

    5% Trimmed Mean

    Median

    Variance

    Std. Deviation

    Minimum

    Maximum

    Range

    Interquartile Range

    Skewness

    Kurtosis

    TOTAL TIME SPENT

    ON THE INTERNET

    Statistic Std. Error

    Normality of the dependent variable

    The dependent variable "total hours spent onthe Internet" [netime] did not satisfy thecriteria for a normal distribution. Both theskewness (3.532) and kurtosis (15.614) felloutside the range from -1.0 to +1.0.

    SW388R7

    Data Analysis &

    Normality of transformed dependent variable

  • 8/12/2019 Assumptions Summer2003

    87/119

    Computers II

    Slide 87

    Normality of transformed dependent variable

    Since "total hours spent on the Internet"[netime] did not satisfy the criteria fornormality, we examine the skewness andkurtosis of each of the transformations tosee if any of them satisfy the criteria.

    The "log of total hours spent on the Internet[LGNETIME=LG10(NETIME)]" satisfied the criteria for a

    normal distribution. The skewness of the distribution(-0.150) was between -1.0 and +1.0 and the kurtosisof the distribution (0.127) was between -1.0 and +1.0.

    The "log of total hours spent on the Internet[LGNETIME=LG10(NETIME)]" was substituted for "totalhours spent on the Internet" [netime] in the analysis.

  • 8/12/2019 Assumptions Summer2003

    88/119

    SW388R7

    Data Analysis &

    Normality of the independent variables - 2

  • 8/12/2019 Assumptions Summer2003

    89/119

    Computers II

    Slide 89

    Descriptives

    13.35 .419

    12.52

    14.18

    13.54

    15.00

    29.535

    5.435

    1

    23

    22

    8.00

    -.686 .187

    -.253 .373

    Mean

    Lower Bound

    Upper Bound

    95% Confidence

    Interval for Mean

    5% Trimmed Mean

    Median

    Variance

    Std. Deviation

    Minimum

    Maximum

    Range

    Interquartile Range

    Skewness

    Kurtosis

    RESPONDENTS INCOME

    Statistic Std. Error

    Normality of the independent variables - 2

    The independent variable "income"[rincom98] satisfied the criteria for a normaldistribution. The skewness of the distribution(-0.686) was between -1.0 and +1.0 and thekurtosis of the distribution (-0.253) wasbetween -1.0 and +1.0.

    SW388R7

    Data Analysis &

    C t II Run the script to test linearity - 1

  • 8/12/2019 Assumptions Summer2003

    90/119

    Computers II

    Slide 90

    Run the script to test linearity - 1

    If the script was not closed after

    it was used for normality, we cantake advantage of thespecifications already entered. Ifthe script was closed, re-open itas you would for normality.

    First, click on the Linearityoptionbutton to request that SPSS producethe output needed to evaluate theassumption of linearity.

    When the linearity optionis selected, a default set oftransformations to test ismarked.

    SW388R7

    Data Analysis &

    C t II Run the script to test linearity - 2

  • 8/12/2019 Assumptions Summer2003

    91/119

    Computers II

    Slide 91

    Run the script to test linearity - 2

    Click on the OK

    button to producethe output.

    Since we have already decided to use thelog of the dependent variable to satisfynormality, that is the form of the

    dependent variable we want to evaluatewith the independent variables. Mark thischeckbox for the dependent variable andclear the others.

    SW388R7

    Data Analysis &

    Comp ters II Linearity test with age of respondent

  • 8/12/2019 Assumptions Summer2003

    92/119

    Computers II

    Slide 92

    Correlations

    1 .074 .119 .096 .164

    . .483 .257 .362 .116

    93 93 93 93 93

    .074 1 .979** .995** .916**

    .483 . .000 .000 .000

    93 270 270 270 270

    .119 .979** 1 .994** .978**

    .257 .000 . .000 .000

    93 270 270 270 270

    .096 .995** .994** 1 .951**

    .362 .000 .000 . .00093 270 270 270 270

    .164 .916** .978** .951** 1

    .116 .000 .000 .000 .

    93 270 270 270 270

    Pearson Correlation

    Sig. (2-tailed)

    N

    Pearson CorrelationSig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    Logarithm of NETIME

    [LG10(NETIME)]

    AGE OF RESPONDENT

    Logarithm of AGE

    [LG10(AGE)]

    Square Root of AGE

    [SQRT(AGE)]

    Inverse of AGE [-1/(AGE)]

    Logarithm

    of NETIME

    [LG10(NE

    TIME)]

    AGE OF

    RESPON

    DENT

    Logarithm of

    AGE

    [LG10(AGE)]

    Square Root

    of AGE

    [SQRT(AGE)]

    Inverse of

    AGE

    [-1/(AGE)]

    Correlation is significant at the 0.01 level (2-tailed).**.

    Linearity test with age of respondent

    The assessment of the linearrelationship between "log of totalhours spent on the Internet[LGNETIME=LG10(NETIME)]" and"age" [age] indicated that therelationship was weak, rather thannonlinear. The statistical probabilitiesassociated with the correlationcoefficients measuring therelationship with the untransformedindependent variable (r=0.074,p=0.483), the logarithmictransformation (r=0.119, p=0.257),the square root transformation(r=0.096, p=0.362), and the inversetransformation (r=0.164, p=0.116),

    were all greater than the level ofsignificance for testing assumptions(0.01).

    There was no evidence that theassumption of linearity was violated.

    SW388R7

    Data Analysis &

    Computers II Linearity test with respondents income

  • 8/12/2019 Assumptions Summer2003

    93/119

    Computers II

    Slide 93

    Correlations

    1 -.053 .063 .060 .073

    . .658 .600 .617 .540

    93 72 72 72 72-.053 1 -.922** -.985** -.602**

    .658 . .000 .000 .000

    72 168 168 168 168

    .063 -.922** 1 .974** .848**

    .600 .000 . .000 .000

    72 168 168 168 168

    .060 -.985** .974** 1 .714**

    .617 .000 .000 . .000

    72 168 168 168 168

    .073 -.602** .848** .714** 1

    .540 .000 .000 .000 .

    72 168 168 168 168

    Pearson Correlation

    Sig. (2-tailed)

    NPearson Correlation

    Sig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    Logarithm of NETIME

    [LG10(NETIME)]

    RESPONDENTS INCOME

    Logarithm of Reflected

    Values of RINCOM98

    [LG10( 24-RINCOM98)]

    Square Root of Reflected

    Values of RINCOM98

    [SQRT( 24-RINCOM98)]

    Inverse of Reflected

    Values of RINCOM98 [-1/(

    24-RINCOM98)]

    Logarithm

    of NETIME

    [LG10(NE

    TIME)]

    RESPONDEN

    TS INCOME

    Logarithmof Reflected

    Values of

    RINCOM98

    [LG10(

    24-RINCOM

    98)]

    Square Rootof Reflected

    Values of

    RINCOM98

    [SQRT(

    24-RINCOM

    98)]

    Inverse ofReflected

    Values of

    RINCOM9

    8 [-1/(

    24-RINC

    OM98)]

    Correlation is significant at the 0.01 level (2-tail ed).**.

    Linearity test with respondent s income

    The assessment of the linearrelationship between "log of total hoursspent on the Internet[LGNETIME=LG10(NETIME)]" and"income" [rincom98] indicated that therelationship was weak, rather thannonlinear. The statistical probabilities

    associated with the correlationcoefficients measuring the relationshipwith the untransformed independentvariable (r=-0.053, p=0.658), thelogarithmic transformation (r=0.063,p=0.600), the square roottransformation (r=0.060, p=0.617),and the inverse transformation(r=0.073, p=0.540), were all greaterthan the level of significance for testingassumptions (0.01).

    There was no evidence that theassumption of linearity was violated.

    SW388R7

    Data Analysis &

    Computers II

    Run the script to testh i f i 1

  • 8/12/2019 Assumptions Summer2003

    94/119

    Computers II

    Slide 94homogeneity of variance - 1

    First, click on the Homogeneity ofvarianceoption button to request thatSPSS produce the output needed toevaluate the assumption ofhomogeneity.

    When the homogeneity ofvariance option is selected, adefault set of transformationsto test is marked.

    If the script was not closed afterit was used for normality, we cantake advantage of thespecifications already entered. Ifthe script was closed, re-open itas you would for normality.

    SW388R7

    Data Analysis &

    Computers II

    Run the script to testh i f i 2

  • 8/12/2019 Assumptions Summer2003

    95/119

    Computers II

    Slide 95homogeneity of variance - 2

    In this problem, we havealready decided to use the logtransformation for thedependent variable, so weonly need test it. Next, clearall of the transformationcheckboxes except forLogarithmic.

    Finally, click onthe OK button toproduce the output.

    SW388R7

    Data Analysis &

    Computers II Levene test of homogeneity of variance

  • 8/12/2019 Assumptions Summer2003

    96/119

    Computers II

    Slide 96

    Test of Homogeneity of Variances

    Logarithm of NETIME [LG10(NETIME)]

    .166 1 91 .685

    Levene

    Statistic df1 df2 Sig.

    Levene test of homogeneity of variance

    Based on the Levene Test, the variance in "log of totalhours spent on the Internet[LGNETIME=LG10(NETIME)]" was homogeneous for thecategories of "sex" [sex]. The probability associatedwith the Levene statistic (0.166) was p=0.685, greater

    than the level of significance for testing assumptions(0.01). The null hypthesis that the group varianceswere equal was not rejected.

    The homogeneity of variance assumption was satisfied.

    SW388R7

    Data Analysis &

    Computers II Answer 1

  • 8/12/2019 Assumptions Summer2003

    97/119

    Computers II

    Slide 97

    Answer 1

    In pre-screening the data for use in a multiple regression of the

    dependent variable "total hours spent on the Internet" [netime]with the independent variables "age" [age], "sex" [sex], and"income" [rincom98], the evaluation of the assumptions ofnormality, linearity, and homogeneity of variance did notindicate any need for a caution to be added to theinterpretation of the analysis.

    1. True

    2. True with caution

    3. False

    4. Inappropriate application of a statistic

    The logarithmic transformation of the dependent variable[LGNETIME=LG10(NETIME)] solved the only problem withnormality that we encountered. In that form, the relationshipwith the metric dependent variables was weak, but there was noevidence of nonlinearity. The variance of log transform of thedependent variable was homogeneous for the categories of the

    nonmetric variable sex.

    No cautions were needed because of a violation of assumptions.A caution was needed because respondents income was ordinallevel.

    The answer to the problem is true with caution.

  • 8/12/2019 Assumptions Summer2003

    98/119

  • 8/12/2019 Assumptions Summer2003

    99/119

    SW388R7

    Data Analysis &

    Computers II Run the script to test normality - 1

  • 8/12/2019 Assumptions Summer2003

    100/119

    C p

    Slide 100

    Run the script to test normality 1

    To run the script to testassumptions, choose theRun Scriptcommand fromthe Utilities menu.

    SW388R7

    Data Analysis &

    Computers II Run the script to test normality - 2

  • 8/12/2019 Assumptions Summer2003

    101/119

    p

    Slide 101

    Run the script to test normality 2

    First, navigate to theSW388R7 folder on yourcomputer.

    Third, click onthe Runbutton toopen the script.

    Second, click on the script name to select it:EvaluatingAssumptionsAndMissingData.SBS

  • 8/12/2019 Assumptions Summer2003

    102/119

  • 8/12/2019 Assumptions Summer2003

    103/119

    SW388R7

    Data Analysis &

    Computers II Normality of the first independent variables

  • 8/12/2019 Assumptions Summer2003

    104/119

    Slide 104

    Descriptives

    1.4944 .09456

    1.3081

    1.6808

    1.4365

    1.4000

    1.9581.39929

    -1.14

    13.39

    14.53

    1.8000

    2.885 .164

    22.665 .327

    Mean

    Lower Bound

    Upper Bound

    95% Confidence

    Interval for Mean

    5% Trimmed Mean

    Median

    VarianceStd. Deviation

    Minimum

    Maximum

    Range

    Interquartile Range

    Skewness

    Kurtosis

    Population growth rate

    Statistic Std. Error

    y p

    The independent variable "population growthrate" [pgrowth] did not satisfy the criteria fora normal distribution. Both the skewness(2.885) and kurtosis (22.665) fell outside therange from -1.0 to +1.0.

    SW388R7

    Data Analysis &

    Computers II

    Normality of transformed independentvariable

  • 8/12/2019 Assumptions Summer2003

    105/119

    Slide 105variable

    Neither the logarithmic(skew=-0.218,kurtosis=1.277), the

    square root (skew=0.873,kurtosis=5.273), nor theinverse transformation(skew=-1.836,kurtosis=5.763) inducednormality in the variable"population growth rate"[pgrowth].

    A caution was added tothe findings.

  • 8/12/2019 Assumptions Summer2003

    106/119

    SW388R7

    Data Analysis &

    Computers II

    Normality of transformed independentvariable

  • 8/12/2019 Assumptions Summer2003

    107/119

    Slide 107variable

    Since the distribution was skewed to theleft, it was necessary to reflect, or reversecode, the values for the variable beforecomputing the transformation.

    The "square root of percent of the total population who was literate(using reflected values) [SQLITERA=SQRT(101-LITERACY)]" satisfiedthe criteria for a normal distribution. The skewness of the distribution(0.567) was between -1.0 and +1.0 and the kurtosis of the distribution(-0.964) was between -1.0 and +1.0. The "square root of percent of thetotal population who was literate (using reflected values)[SQLITERA=SQRT(101-LITERACY)]" was substituted for "percent of thetotal population who was literate" [literacy] in the analysis.

    SW388R7

    Data Analysis &

    Computers II Normality of the third independent variables

  • 8/12/2019 Assumptions Summer2003

    108/119

    Slide 108

    Descriptives

    8554.43 580.523

    7410.27

    9698.59

    7818.67

    5000.00

    7.4E+078590.954

    510

    36400

    35890

    11200.00

    1.207 .164

    .475 .327

    Mean

    Lower Bound

    Upper Bound

    95% Confidence

    Interval for Mean

    5% Trimmed Mean

    Median

    VarianceStd. Deviation

    Minimum

    Maximum

    Range

    Interquartile Range

    Skewness

    Kurtosis

    Per capita GDP

    Statistic Std. Error

    y p

    The independent variable "per capita GDP"[gdp] did not satisfy the criteria for a normaldistribution. The kurtosis of the distribution(0.475) was between -1.0 and +1.0, but theskewness of the distribution (1.207) felloutside the range from -1.0 to +1.0.

    SW388R7

    Data Analysis &

    Computers II

    Normality of transformed independentvariable

  • 8/12/2019 Assumptions Summer2003

    109/119

    Slide 109variable

    The "square root of percapita GDP[SQGDP=SQRT(GDP)]"satisfied the criteria for a

    normal distribution. Theskewness of thedistribution (0.614) wasbetween -1.0 and +1.0and the kurtosis of thedistribution (-0.773) wasbetween -1.0 and +1.0.

    The "square root of percapita GDP[SQGDP=SQRT(GDP)]"was substituted for "percapita GDP" [gdp] in theanalysis.

    SW388R7

    Data Analysis &

    Computers II Run the script to test linearity - 1

  • 8/12/2019 Assumptions Summer2003

    110/119

    Slide 110

    p y

    If the script was not closed after

    it was used for normality, we cantake advantage of thespecifications already entered. Ifthe script was closed, re-open itas you would for normality.

    First, click on the Linearityoptionbutton to request that SPSS producethe output needed to evaluate theassumption of linearity. When the linearity option

    is selected, a default set oftransformations to test ismarked.

    Click on the OKbutton to producethe output.

    SW388R7

    Data Analysis &

    Computers II Linearity test with population growth rate

  • 8/12/2019 Assumptions Summer2003

    111/119

    Slide 111

    Correlations

    1 -.262** -.314** -.301** -.282**

    . .000 .000 .000 .000

    219 219 219 219 219

    -.262** 1 .930** .979** .801**.000 . .000 .000 .000

    219 219 219 219 219

    -.314** .930** 1 .985** .956**

    .000 .000 . .000 .000

    219 219 219 219 219

    -.301** .979** .985** 1 .897**

    .000 .000 .000 . .000219 219 219 219 219

    -.282** .801** .956** .897** 1

    .000 .000 .000 .000 .

    219 219 219 219 219

    Pearson Correlation

    Sig. (2-tailed)

    N

    Pearson CorrelationSig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    Life expectancy at birth -

    total population

    Population growth rate

    Logarithm of PGROWTH

    [LG10( 2.14+PGROWTH)]

    Square Root of

    PGROWTH [SQRT(

    2.14+PGROWTH)]

    Inverse of PGROWTH [-1/(

    2.14+PGROWTH)]

    Life

    expectancy

    at birth - total

    population

    Population

    growth rate

    Logarithm of

    PGROWTH

    [LG10(

    2.14+PGRO

    WTH)]

    Square Root

    of PGROWTH

    [SQRT(

    2.14+PGROW

    TH)]

    Inverse of

    PGROWT

    H [-1/(

    2.14+PG

    ROWTH)]

    Correlation is significant at the 0.01 level (2-tailed).**.

    The assessment of the linearityof the relationship between "lifeexpectancy at birth" [lifeexp]and "population growth rate"[pgrowth] indicated that therelationship could be considered

    linear because the probabilityassociated with the correlationcoefficient for the relationship(r=-0.262) was statisticallysignficant (p

  • 8/12/2019 Assumptions Summer2003

    112/119

    Slide 112

    Correlations

    1 .724** -.670** -.720** -.467**

    . .000 .000 .000 .000

    219 203 203 203 203

    .724** 1 -.895** -.978** -.594**

    .000 . .000 .000 .000

    203 203 203 203 203

    -.670** -.895** 1 .966** .857**

    .000 .000 . .000 .000

    203 203 203 203 203

    -.720** -.978** .966** 1 .717**

    .000 .000 .000 . .000

    203 203 203 203 203

    -.467** -.594** .857** .717** 1

    .000 .000 .000 .000 .

    203 203 203 203 203

    Pearson Correlation

    Sig. (2-tailed)

    NPearson Correlation

    Sig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    Life expectancy at birth -

    total population

    Percent l iterate - total

    population

    Logarithm of Reflected

    Values of LITERACY

    [LG10( 101-LITERACY)]

    Square Root of Reflected

    Values of LITERACY

    [SQRT( 101-LITERACY)]

    Inverse of Reflected

    Values of LITERACY [-1/(

    101-LITERACY)]

    Life

    expectancy

    at birth - total

    population

    Percent

    literate - total

    population

    Logarithmof Reflected

    Values of

    LITERACY

    [LG10(

    101-LITERA

    CY)]

    Square Rootof Reflected

    Values of

    LITERACY

    [SQRT(

    101-LITERA

    CY)]

    Inverse ofReflected

    Values of

    LITERAC

    Y [-1/(

    101-LITE

    RACY)]

    Correlation is significant at the 0.01 level (2-tai led).**.

    The transformation "squareroot of percent of the totalpopulation who was literate(using reflected values)[SQLITERA=SQRT(101-LITERACY)]" was incorporatedin the analysis in theevaluation of normality.

    Additional transformations forlinearity were not considered.

    SW388R7

    Data Analysis &

    Computers II Linearity test with per capita GDP

  • 8/12/2019 Assumptions Summer2003

    113/119

    Slide 113

    Correlations

    1 .643** .762** .713** .727**

    . .000 .000 .000 .000

    219 219 219 219 219

    .643** 1 .898** .978** .637**

    .000 . .000 .000 .000

    219 219 219 219 219

    .762** .898** 1 .969** .890**

    .000 .000 . .000 .000

    219 219 219 219 219

    .713** .978** .969** 1 .762**

    .000 .000 .000 . .000

    219 219 219 219 219

    .727** .637** .890** .762** 1

    .000 .000 .000 .000 .

    219 219 219 219 219

    Pearson Correlation

    Sig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    Life expectancy at birth -

    total population

    Per capita GDP

    Logarithm of GDP

    [LG10(GDP)]

    Square Root of GDP

    [SQRT(GDP)]

    Inverse of GDP [-1/(GDP)]

    Life

    expectancy

    at birth - total

    population

    Per capi ta

    GDP

    Logarithm of

    GDP

    [LG10(GDP)]

    Square Root

    of GDP

    [SQRT(GDP)]

    Inverse of

    GDP

    [-1/(GDP)]

    Correlation is significant at the 0.01 level (2-tailed).**.

    The transformation "squareroot of per capita GDP[SQGDP=SQRT(GDP)]" wasincorporated in the analysisin the evaluation ofnormality. Additional

    transformations for linearitywere not considered.

    SW388R7

    Data Analysis &

    Computers II

    Run the script to testhomogeneity of variance - 1

  • 8/12/2019 Assumptions Summer2003

    114/119

    Slide 114homogeneity of variance 1

    There were no nonmetricvariables in this analysis, so thetest of homogeneity of variancewas not conducted.

    SW388R7

    Data Analysis &

    Computers II Answer 2

  • 8/12/2019 Assumptions Summer2003

    115/119

    Slide 115

    In pre-screening the data for use in a multiple regression of the

    dependent variable "life expectancy at birth" [lifeexp] with theindependent variables "population growth rate" [pgrowth], "percent ofthe total population who was literate" [literacy], and "per capita GDP"[gdp], the evaluation of the assumptions of normality, linearity, andhomogeneity of variance did not indicate any need for a caution to beadded to the interpretation of the analysis.

    1. True2. True with caution

    3. False

    4. Inappropriate application of a statistic

    Two transformations were substituted to satisfy the assumption ofnormality: the "square root of percent of the total population whowas literate (using reflected values) [SQLITERA=SQRT(101-LITERACY)]" and the "square root of per capita GDP[SQGDP=SQRT(GDP)]" was substituted for "per capita GDP" [gdp] inthe analysis.

    However, none of the transformations induced normality in thevariable "population growth rate" [pgrowth]. A caution was added to

    the findings.

    The answer to the problem is false. A caution was added because"Population growth rate" [pgrowth] did not satisfy the assumption ofnormality and none of the transformations were successful ininducing normality.

    SW388R7

    Data Analysis &

    Computers II

    Steps in evaluating assumptions:level of measurement

  • 8/12/2019 Assumptions Summer2003

    116/119

    Slide 116level of measurement

    The following is a guide to the decision process for answeringproblems about assumptions for multiple regression:

    Incorrect application

    of a statistic

    Yes

    NoIs the dependent

    variable metric and theindependent variablesmetric or dichotomous?

    SW388R7

    Data Analysis &

    Computers II

    Steps in evaluating assumptions:assumption of normality for metric variable

  • 8/12/2019 Assumptions Summer2003

    117/119

    Slide 117assumption of normality for metric variable

    Does one or more of thetransformations satisfythe criteria for a normaldistribution?

    No

    Yes

    Assumptionsatisfied, useuntransformedvariable in analysis

    No

    YesDoes the dependentvariable satisfy thecriteria for a normaldistribution?

    Assumptionsatisfied, usetransformedvariable withsmallest skew

    Assumption notsatisfied, useuntransformedvariable in analysis

    Add caution tointerpretation

    SW388R7

    Data Analysis &

    Computers II

    Steps in evaluating assumptions:assumption of linearity for metric variables

  • 8/12/2019 Assumptions Summer2003

    118/119

    Slide 118assumption of linearity for metric variables

    Yes

    Probability of correlation (r) forrelationship between IV andDV

  • 8/12/2019 Assumptions Summer2003

    119/119

    Slide 119g y

    Yes

    Probability of Levenestatistic