assumptions summer2003
Embed Size (px)
TRANSCRIPT
-
8/12/2019 Assumptions Summer2003
1/119
SW388R7
Data Analysis &
Computers II
Slide 1
Assumptions of multiple regression
Assumption of normality
Assumption of linearity
Assumption of homoscedasticity
Script for testing assumptions
Practice problems
-
8/12/2019 Assumptions Summer2003
2/119
SW388R7
Data Analysis &
Computers II
Slide 2
Assumptions of Normality, Linearity, andHomoscedasticity
Multiple regression assumes that the variables in theanalysis satisfy the assumptions of normality,linearity, and homoscedasticity. (There is also anassumption of independence of errors but thatcannot be evaluated until the regression is run.)
There are two general strategies for checkingconformity to assumptions: pre-analysis and post-analysis. In pre-analysis, the variables are checkedprior to running the regression. In post-analysis, the
assumptions are evaluated by looking at the patternof residuals (errors or variability) that the regressionwas unable to predict accurately.
The text recommends pre-analysis, the strategy we
will follow.
-
8/12/2019 Assumptions Summer2003
3/119
SW388R7
Data Analysis &
Computers II
Slide 3
Assumption of Normality
The assumption of normality prescribes that thedistribution of cases fit the pattern of a normalcurve.
It is evaluated for all metric variables included in the
analysis, independent variables as well as thedependent variable.
With multivariate statistics, the assumption is thatthe combination of variables follows a multivariate
normal distribution. Since there is not a direct test for multivariate
normality, we generally test each variableindividually and assume that they are multivariatenormal if they are individually normal, though this is
not necessarily the case.
-
8/12/2019 Assumptions Summer2003
4/119
SW388R7
Data Analysis &
Computers II
Slide 4
Assumption of Normality:Evaluating Normality
There are both graphical and statistical methods forevaluating normality.
Graphical methods include the histogram andnormality plot.
Statistical methods include diagnostic hypothesistests for normality, and a rule of thumb that saysa variable is reasonably close to normal if itsskewness and kurtosis have values between 1.0
and +1.0. None of the methods is absolutely definitive.
We will use the criteria that the skewness andkurtosis of the distribution both fall between -1.0and +1.0.
-
8/12/2019 Assumptions Summer2003
5/119
SW388R7
Data Analysis &
Computers II
Slide 5
Assumption of Normality:Histograms and Normality Plots
RS OCCUPATI ONAL PRESTIG E SCORE (1980)
85.0
80.0
75.0
70.0
65.0
60.0
55.0
50.0
45.0
40.0
35.0
30.0
25.0
20.0
15.0
Histogram50
40
30
20
10
0
Std. Dev = 13.94
Mean = 44.2
N = 255.00
TIME SPENT USING E-MAIL
40.035.030.025.020.015.010.05.00.0
Histogram100
80
60
40
20
0
Std. Dev = 6.14
Mean = 3.6
N = 119.00
Normal Q-Q Plot of TIME SPENT USING E-MAIL
Observed Value
50403020100-10
3
2
1
0
-1
-2
Normal Q-Q Plot of RS OCCUPATIONAL PREST
Observed Value
100806040200
3
2
1
0
-1
-2
-3
On the left side of the slide is the histogram and normality plot
for a occupational prestige that could reasonably becharacterized as normal. Time using email, on the right, is notnormally distributed.
-
8/12/2019 Assumptions Summer2003
6/119
SW388R7
Data Analysis &
Computers II
Slide 6
Assumption of Normality:Hypothesis test of normality
Tests of Normality
.121 255 .000 .964 255 .000
RS OCCUPATIONAL
PRESTIGE SCORE(1980)
Statistic df Sig. Statistic df Sig.
Kolmogorov-Smirnova
Shapiro-Wilk
Lilliefors Significance Correctiona.
Tests of Normality
.296 119 .000 .601 119 .000TIME SPENT
USING E-MAIL
Statistic df Sig. Statistic df Sig.
Kolmogorov-Smirnova
Shapiro-Wilk
Lilli efors Significance Correctiona.
The hypothesis test for normality tests the null hypothesis that thevariable is normal, i.e. the actual distribution of the variable fits thepattern we would expect if it is normal. If we fail to reject the nullhypothesis, we conclude that the distribution is normal.
The distribution for both of the variable depicted on the previous slide areassociated with low significance values that lead to rejecting the null
hypothesis and concluding that neither occupational prestige nor timeusing email is normally distributed.
-
8/12/2019 Assumptions Summer2003
7/119
SW388R7
Data Analysis &
Computers II
Slide 7
Assumption of Normality:Skewness, kurtosis, and normality
Using the rule of thumb that a rule of thumb that says a variable isreasonably close to normal if its skewness and kurtosis have valuesbetween 1.0 and +1.0, we would decide that occupationalprestige is normally distributed and time using email is not.
We will use this rule of thumb for normality in our strategy forsolving problems.
-
8/12/2019 Assumptions Summer2003
8/119
SW388R7
Data Analysis &
Computers II
Slide 8
Assumption of Normality:Transformations
When a variable is not normally distributed, we cancreate a transformed variable and test it fornormality. If the transformed variable is normallydistributed, we can substitute it in our analysis.
Three common transformations are: the logarithmictransformation, the square root transformation, andthe inverse transformation.
All of these change the measuring scale on thehorizontal axis of a histogram to produce atransformed variable that is mathematicallyequivalent to the original variable.
-
8/12/2019 Assumptions Summer2003
9/119
SW388R7
Data Analysis &
Computers II
Slide 9
Assumption of Normality:Computing Transformations
We will use SPSS scripts as described below to testassumptions and compute transformations.
For additional details on the mechanics of computingtransformations, see Computing Transformations
-
8/12/2019 Assumptions Summer2003
10/119
SW388R7
Data Analysis &
Computers II
Slide 10
Assumption of Normality:When transformations do not work
When none of the transformations induces normality
in a variable, including that variable in the analysis
will reduce our effectiveness at identifying statistical
relationships, i.e. we lose power.
We do have the option of changing the way the
information in the variable is represented, e.g.
substitute several dichotomous variables for a single
metric variable.
-
8/12/2019 Assumptions Summer2003
11/119
SW388R7
Data Analysis &
Computers II
Slide 11
Assumption of Normality:Computing Explore descriptive statistics
To compute the statisticsneeded for evaluating thenormality of a variable, selectthe Explore command from
the Descriptive Statisticsmenu.
-
8/12/2019 Assumptions Summer2003
12/119
SW388R7
Data Analysis &
Computers II
Slide 12
Assumption of Normality:Adding the variable to be evaluated
First, click on thevariable to be includedin the analysis tohighlight it.
Second, click on rightarrow button to movethe highlighted variableto the Dependent List.
-
8/12/2019 Assumptions Summer2003
13/119
SW388R7
Data Analysis &
Computers II
Slide 13
Assumption of Normality:Selecting statistics to be computed
To select the statistics for theoutput, click on theStatisticscommand button.
-
8/12/2019 Assumptions Summer2003
14/119
SW388R7
Data Analysis &
Computers II
Slide 14
Assumption of Normality:Including descriptive statistics
First, click on theDescriptivescheckboxto select it. Clear theother checkboxes.
Second, click on theContinuebutton tocomplete the request forstatistics.
-
8/12/2019 Assumptions Summer2003
15/119
SW388R7
Data Analysis &
Computers II
Slide 15
Assumption of Normality:Selecting charts for the output
To select the diagnostic chartsfor the output, click on thePlotscommand button.
-
8/12/2019 Assumptions Summer2003
16/119
SW388R7
Data Analysis &
Computers II
Slide 16
Assumption of Normality:Including diagnostic plots and statistics
First, click on theNoneoption buttonon the Boxplots panelsince boxplots are notas helpful as othercharts in assessingnormality.
Second, click on theNormality plots with testscheckbox to includenormality plots and thehypothesis tests fornormality.
Third, click on the Histogramcheckbox to include ahistogram in the output. Youmay want to examine thestem-and-leaf plot as well,though I find it less useful.
Finally, click on theContinuebutton tocomplete the request.
-
8/12/2019 Assumptions Summer2003
17/119
SW388R7
Data Analysis &
Computers II
Slide 17
Assumption of Normality:Completing the specifications for the analysis
Click on the OK button tocomplete the specificationsfor the analysis and requestSPSS to produce theoutput.
-
8/12/2019 Assumptions Summer2003
18/119
SW388R7
Data Analysis &
Computers II
Slide 18
TOTAL TIME SPENT ON THE INTERNET
100.0
90.0
80.0
70.0
60.0
50.0
40.0
30.0
20.0
10.0
0.0
Histogram50
40
30
20
10
0
Std. Dev = 15.35
Mean = 10.7
N = 93.00
Assumption of Normality:The histogram
An initial impression of thenormality of the distributioncan be gained by examiningthe histogram.
In this example, the
histogram shows a substantialviolation of normality causedby a extremely large value inthe distribution.
-
8/12/2019 Assumptions Summer2003
19/119
SW388R7
Data Analysis &
Computers II
Slide 19
Normal Q-Q Plot of TOTAL TIME SPENT ON TH
Observed Value
120100806040200-20-40
3
2
1
0
-1
-2
-3
Assumption of Normality:The normality plot
The problem with the normality of thisvariables distribution is reinforced by thenormality plot.
If the variable were normally distributed,
the red dots would fit the green line veryclosely. In this case, the red points in theupper right of the chart indicate thesevere skewing caused by the extremelylarge data values.
-
8/12/2019 Assumptions Summer2003
20/119
SW388R7
Data Analysis &
Computers II
Slide 20
Tests of Normality
.246 93 .000 .606 93 .000TOTAL TIME SPENT
ON THE INTERNET
Statistic df Sig. Statistic df Sig.
Kolmogorov-Smirnova
Shapiro-Wilk
Lilli efors Significance Correctiona.
Assumption of Normality:The test of normality
Since the sample size is larger than 50, we use the Kolmogorov-Smirnovtest. If the sample size were 50 or less, we would use the Shapiro-Wilkstatistic instead.
The null hypothesis for the test of normality states that the actualdistribution of the variable is equal to the expected distribution, i.e., thevariable is normally distributed. Since the probability associated with thetest of normality is < 0.001 is less than or equal to the level of significance
(0.01), we reject the null hypothesis and conclude that total hours spent onthe Internet is not normally distributed. (Note: we report the probability as
-
8/12/2019 Assumptions Summer2003
21/119
SW388R7
Data Analysis &
Computers II
Slide 21
Descriptives
10.7312 1.59183
7.5697
13.8927
8.2949
5.5000235.655
15.35106
.20
102.00
101.80
10.2000
3.532 .250
15.614 .495
Mean
Lower Bound
Upper Bound
95% Confidence
Interval for Mean
5% Trimmed Mean
MedianVariance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis
TOTAL TIME SPEN
ON THE INTERNET
Statistic Std. Error
Assumption of Normality:The rule of thumb for skewness and kurtosis
Using the rule of thumb for evaluating normality with the skewnessand kurtosis statistics, we look at the table of descriptive statistics.
The skewness and kurtosis for the variable both exceed the rule ofthumb criteria of 1.0. The variable is not normally distributed.
-
8/12/2019 Assumptions Summer2003
22/119
SW388R7
Data Analysis &
Computers II
Slide 22
Assumption of Linearity
Linearity means that the amount of change, or rateof change, between scores on two variables isconstant for the entire range of scores for thevariables.
Linearity characterizes the relationship between twometric variables. It is tested for the pairs formed bydependent variable and each metric independentvariable in the analysis.
There are relationships that are not linear. The relationship between learning and time may not be
linear. Learning a new subject shows rapid gains at first,then the pace slows down over time. This is oftenreferred to a a learning curve.
Population growth may not be linear. The pattern oftenshows growth at increasing rates over time.
-
8/12/2019 Assumptions Summer2003
23/119
SW388R7
Data Analysis &
Computers II
Slide 23
Assumption of Linearity:Evaluating linearity
There are both graphical and statistical methods forevaluating linearity.
Graphical methods include the examination of
scatterplots, often overlaid with a trendline. Whilecommonly recommended, this strategy is difficult toimplement.
Statistical methods include diagnostic hypothesistests for linearity, a rule of thumb that says arelationship is linear if the difference between thelinear correlation coefficient (r) and the nonlinearcorrelation coefficient (eta) is small, and examining
patterns of correlation coefficients.
-
8/12/2019 Assumptions Summer2003
24/119
SW388R7
Data Analysis &
Computers II
Slide 24
RESPONDENT'S SOCIOECONOMIC INDEX
100806040200
90
80
70
60
50
40
30
20
10
Assumption of Linearity:Interpreting scatterplots
The advice for interpretinglinearity is often phrased aslooking for a cigar-shapedband, which is very evident inthis plot.
-
8/12/2019 Assumptions Summer2003
25/119
SW388R7
Data Analysis &
Computers II
Slide 25
Gross domestic product / capita
3000020000100000-10000
200
100
0
-100
Assumption of Linearity:Interpreting scatterplots
Sometimes, a scatterplotshows a clearly nonlinearpattern that requirestransformation, like the oneshown in the scatterplot.
-
8/12/2019 Assumptions Summer2003
26/119
SW388R7
Data Analysis &
Computers II
Slide 26
Assumption of Linearity:Scatterplots that are difficult to interpret
AGE OF RESPONDENT
8070605040302010
120
100
80
60
40
20
0
-20
HOURS PER DAY WATCHING TV
1614121086420-2
120
100
80
60
40
20
0
-20
The correlations for both of theserelationships are low.
The linearity of the relationship on the rightcan be improved with a transformation; theplot on the left cannot. However, this is notnecessarily obvious from the scatterplots.
-
8/12/2019 Assumptions Summer2003
27/119
SW388R7
Data Analysis &
Computers II
Slide 27
Correlations
1 .017 .048 .032 .079
. .874 .648 .761 .453
93 93 93 93 93
.017 1 .979** .995** .916**
.874 . .000 .000 .000
93 270 270 270 270
.048 .979** 1 .994** .978**
.648 .000 . .000 .000
93 270 270 270 270
.032 .995** .994** 1 .951**
.761 .000 .000 . .000
93 270 270 270 270
.079 .916** .978** .951** 1
.453 .000 .000 .000 .
93 270 270 270 270
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
TOTAL TIME SPENT ON
THE INTERNET
AGE OF RESPONDENT
Logarithm of AGE
[LG10(AGE)]
Square Root of AGE
[SQRT(AGE)]
Inverse of AGE [-1/(AGE)]
TOTAL TIME
SPENT ON
THE
INTERNET
AGE OF
RESPON
DENT
Logarithm of
AGE
[LG10(AGE)]
Square Root
of AGE
[SQRT(AGE)]
Inverse of
AGE
[-1/(AGE)]
Correlation is significant at the 0.01 level (2-tailed).**.
Assumption of Linearity:Using correlation matrices
Creating a correlation matrixfor the dependent variableand the original andtransformed variations of theindependent variable providesus with a pattern that iseasier to interpret.
The information that we needis in the first column of thematrix which shows thecorrelation and significancefor the dependent variableand all forms of theindependent variable.
-
8/12/2019 Assumptions Summer2003
28/119
SW388R7
Data Analysis &
Computers II
Slide 28
Correlations
1 .017 .048 .032 .079
. .874 .648 .761 .453
93 93 93 93 93
.017 1 .979** .995** .916**
.874 . .000 .000 .000
93 270 270 270 270
.048 .979** 1 .994** .978**
.648 .000 . .000 .000
93 270 270 270 270
.032 .995** .994** 1 .951**
.761 .000 .000 . .000
93 270 270 270 270
.079 .916** .978** .951** 1
.453 .000 .000 .000 .
93 270 270 270 270
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
TOTAL TIME SPENT ON
THE INTERNET
AGE OF RESPONDENT
Logarithm of AGE
[LG10(AGE)]
Square Root of AGE
[SQRT(AGE)]
Inverse of AGE [-1/(AGE)]
TOTAL TIME
SPENT ON
THE
INTERNET
AGE OF
RESPON
DENT
Logarithm of
AGE
[LG10(AGE)]
Square Root
of AGE
[SQRT(AGE)]
Inverse of
AGE
[-1/(AGE)]
Correlation is significant at the 0.01 level (2-tailed).**.
Assumption of Linearity:The pattern of correlations for no relationship
The correlation between thetwo variables is very weakand statistically non-significant. If we viewed thisas a hypothesis test for thesignificance of r, we wouldconclude that there is norelationship between thesevariables.
Moreover, none of significancetests for the correlations withthe transformed dependentvariable are statisticallysignificant. There is norelationship between thesevariables; it is not a problemwith non-linearity.
-
8/12/2019 Assumptions Summer2003
29/119
SW388R7
Data Analysis &
Computers II
Slide 29
Correlations
1 .021 .258** .066 .198**
. .762 .000 .331 .003
219 219 219 219 219
.021 1 .282** .897** .038
.762 . .000 .000 .572
219 219 219 219 219
.258** .282** 1 .600** .606**
.000 .000 . .000 .000
219 219 219 219 219
.066 .897** .600** 1 .164*
.331 .000 .000 . .015
219 219 219 219 219
.198** .038 .606** .164* 1
.003 .572 .000 .015 .
219 219 219 219 219
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Infant mortali ty rate
population
Logarithm of POP
[LG10(POP)]
Square Root of POP
[SQRT(POP)]
Inverse of POP [-1/(POP)]
Infant
mortality rate population
Logarithm of
POP
[LG10(POP)]
Square Root
of POP
[SQRT(POP)]
Inverse of
POP
[-1/(POP)]
Correlation is significant at the 0.01 level (2-tailed).**.
Correlation is significant at the 0.05 level (2-tailed).*.
Assumption of Linearity:Correlation pattern suggesting transformation
The correlation between thetwo variables is very weakand statistically non-significant. If we viewed thisas a hypothesis test for thesignificance of r, we wouldconclude that there is norelationship between thesevariables.
However, the probabilityassociated with the largercorrelation for the logarithmictransformation is statisticallysignificant, suggesting thatthis is a transformation wemight want to use in ouranalysis.
-
8/12/2019 Assumptions Summer2003
30/119
SW388R7
Data Analysis &
Computers II
Slide 30
Assumption of Linearity:Correlation pattern suggesting substitution
Should it happen that the correlation between a
transformed independent variable and the
dependent variable is substantially stronger than the
relationship between the untransformed independent
variable and the dependent variable, thetransformation should be considered even if the
relationship involving the untransformed
independent variable is statistically significant.
A difference of +0.20 or -0.20, or more, would be
considered substantial enough since a change of this
size would alter our interpretation of the
relationship.
-
8/12/2019 Assumptions Summer2003
31/119
SW388R7
Data Analysis &
Computers II
Slide 31
Assumption of Linearity:Transformations
When a relationship is not linear, we can transformone or both variables to achieve a relationship that is
linear.
Three common transformations to induce linearityare: the logarithmic transformation, the square root
transformation, and the inverse transformation.
All of these transformations produce a new variablethat is mathematically equivalent to the original
variable, but expressed in different measurement
units, e.g. logarithmic units instead of decimal units.
-
8/12/2019 Assumptions Summer2003
32/119
SW388R7
Data Analysis &
Computers II
Slide 32
Assumption of Linearity:When transformations do not work
When none of the transformations induces linearity
in a relationship, our statistical analysis will
underestimate the presence and strength of the
relationship, i.e. we lose power.
We do have the option of changing the way the
information in the variables are represented, e.g.
substitute several dichotomous variables for a single
metric variable. This bypasses the assumption oflinearity while still attempting to incorporate the
information about the relationship in the analysis.
-
8/12/2019 Assumptions Summer2003
33/119
SW388R7
Data Analysis &
Computers II
Slide 33
Assumption of Linearity:Creating the scatterplot
Suppose we are interested inthe linearity of therelationship between "hoursper day watching TV" and"total hours spent on theInternet".
The most commonlyrecommended strategy forevaluating linearity is visualexamination of a scatter plot.
To obtain a scatter plotin SPSS, select theScatter command fromthe Graphsmenu.
-
8/12/2019 Assumptions Summer2003
34/119
SW388R7
Data Analysis &
Computers II
Slide 34
Assumption of Linearity:Selecting the type of scatterplot
First, click onthumbnail sketch of asimple scatterplot tohighlight it.
Second, click onthe Define button tospecify the variablesto be included in thescatterplot.
-
8/12/2019 Assumptions Summer2003
35/119
SW388R7
Data Analysis &
Computers II
Slide 35
Assumption of Linearity:Selecting the variables
First, move thedependent variablenetimeto the Y
Axistext box.
Second, move theindependentvariable tvhourstotheX axistextbox.
If a problem statement mentions arelationship between two variableswithout clearly indicating which isthe independent variable and whichis the dependent variable, the firstmentioned variable is taken to thebe independent variable.
Third, click onthe OKbutton tocomplete thespecifications forthe scatterplot.
-
8/12/2019 Assumptions Summer2003
36/119
SW388R7
Data Analysis &
Computers II
Slide 36
Assumption of Linearity:The scatterplot
The scatterplot is produced in
the SPSS output viewer.
The points in a scatterplot areconsidered linear if they forma cigar-shaped elliptical band.
The pattern in this scatterplotis not really clear.
-
8/12/2019 Assumptions Summer2003
37/119
SW388R7
Data Analysis &
Computers II
Slide 37
Assumption of Linearity:Adding a trendline
To try to determine if the relationship is linear,we can add a trendline to the chart.
To add a trendlineto the chart, weneed to open thechart for editing.
To open the chartfor editing, doubleclick on it.
-
8/12/2019 Assumptions Summer2003
38/119
SW388R7
Data Analysis &
Computers II
Slide 38
Assumption of Linearity:The scatterplot in the SPSS Chart Editor
The chart that wedouble clicked on is
opened for editing in theSPSS Chart Editor.
To add the trendline, select the
Options commandfrom the Chartmenu.
-
8/12/2019 Assumptions Summer2003
39/119
SW388R7
Data Analysis &
Computers II
Slide 39
Assumption of Linearity:Requesting the fit line
In the Scatterplot Optionsdialog box, we click on theTotalcheckbox in the Fit Linepanel in order to request thetrend line.
Click on the Fit Optionsbutton to request the rcoefficient of determinationas a measure of thestrength of therelationship.
-
8/12/2019 Assumptions Summer2003
40/119
SW388R7
Data Analysis &
Computers II
Slide 40
Assumption of Linearity:Requesting r
First, the Linearregressionthumbnailsketch should behighlighted as the typeof fit line to be added tothe chart.
Second, click on the FitOptions Click on the DisplayR-square in Legendcheckboxto add this item to ouroutput.
Third, click on theContinuebutton tocomplete theoptions request.
-
8/12/2019 Assumptions Summer2003
41/119
-
8/12/2019 Assumptions Summer2003
42/119
SW388R7
Data Analysis &
Computers II
Slide 42
Assumption of Linearity:The fit line and r
The red fit line isadded to the chart.
The value of r(0.0460)suggests thatthe relationship
is weak.
-
8/12/2019 Assumptions Summer2003
43/119
SW388R7
Data Analysis &
Computers II
Slide 43
Assumption of Linearity:Computing the transformations
There are fourtransformations that wecan use to achieve orimprove linearity.
The compute dialogs forthese fourtransformations for
linearity are shown.
-
8/12/2019 Assumptions Summer2003
44/119
SW388R7
Data Analysis &
Computers II
Slide 44
Assumption of Linearity:Creating the scatterplot matrix
To create the scatterplotmatrix, select theScatter command inthe Graphsmenu.
-
8/12/2019 Assumptions Summer2003
45/119
SW388R7
Data Analysis &
Computers II
Slide 45
Assumption of Linearity:Selecting type of scatterplot
First, click on theMatrixthumbnailsketch to indicatewhich type ofscatterplot we want.
Second, click on theDefinebutton to selectthe variables for thescatterplot.
-
8/12/2019 Assumptions Summer2003
46/119
SW388R7
Data Analysis &
Computers II
Slide 46
Assumption of Linearity:Specifications for scatterplot matrix
First, move the dependentvariable, the independent variableand all of the transformations tothe Matrix Variableslist box.
Second, clickon the OKbutton toproduce thescatterplot.
-
8/12/2019 Assumptions Summer2003
47/119
SW388R7
Data Analysis &
Computers II
Slide 47
TOTAL TIME SPENT ON
HOURS PER DAY WATCHI
LGTVHOUR
SQTVHOUR
INTVHOUR
S2TVHOUR
Assumption of Linearity:The scatterplot matrix
The scatterplot matrix shows athumbnail sketch of scatterplotsfor each independent variable ortransformation with thedependent variable. Thescatterplot matrix may suggest
which transformations might beuseful.
-
8/12/2019 Assumptions Summer2003
48/119
SW388R7
Data Analysis &
Computers II
Slide 48
Assumption of Linearity:Creating the correlation matrix
To create the correlationmatrix, select theCorrelate | Bivariatecommand in theAnalyze
menu.
-
8/12/2019 Assumptions Summer2003
49/119
SW388R7
Data Analysis &
Computers II
Slide 49
Assumption of Linearity:Specifications for correlation matrix
First, move the dependentvariable, the independent variableand all of the transformations tothe Variableslist box.
Second, click onthe OKbutton toproduce thecorrelation matrix.
-
8/12/2019 Assumptions Summer2003
50/119
SW388R7
Data Analysis &
Computers II
Slide 50
Correlations
1 .215 .104 .156 .045 .328**
. .079 .397 .203 .713 .006
93 68 68 68 68 68.215 1 .874** .967** .626** .903**
.079 . .000 .000 .000 .000
68 160 160 160 160 160
.104 .874** 1 .967** .910** .611**
.397 .000 . .000 .000 .000
68 160 160 160 160 160
.156 .967** .967** 1 .784** .774**
.203 .000 .000 . .000 .000
68 160 160 160 160 160
.045 .626** .910** .784** 1 .335**
.713 .000 .000 .000 . .000
68 160 160 160 160 160
.328** .903** .611** .774** .335** 1
.006 .000 .000 .000 .000 .
68 160 160 160 160 160
Pearson Correlat ion
Sig. (2-tail ed)
NPearson Correlat ion
Sig. (2-tail ed)
N
Pearson Correlat ion
Sig. (2-tail ed)
N
Pearson Correlat ion
Sig. (2-tail ed)
N
Pearson Correlat ion
Sig. (2-tail ed)
N
Pearson Correlat ion
Sig. (2-tail ed)
N
TOTAL TIME SPENT
ON THE INTERNET
HOURS PER DAY
WATCHING TV
LGTVHOUR
SQTVHOUR
INTVHOUR
S2TVHOUR
TOTAL TIME
SPENT ON
THE
INTERNET
HOURS PER
DAY
WATCHING
TV LGTVHOUR SQTVHOUR INTVHOUR S2TVHOUR
Correlation is significant at the 0.01 level (2-tailed).**.
Assumption of Linearity:The correlation matrix
The answers to the problemsare based on the correlationmatrix.
Before we answer thequestion in this problem, wewill use a script to produce
the output.
-
8/12/2019 Assumptions Summer2003
51/119
SW388R7
Data Analysis &
Computers II
Slide 51
Assumption of Homoscedasticity
Homoscedasticity refers to the assumption that thedependent variable exhibits similar amounts of
variance across the range of values for an
independent variable.
While it applies to independent variables at all three
measurement levels, the methods that we will use to
evaluation homoscedasticity requires that the
independent variable be non-metric (nominal orordinal) and the dependent variable be metric
(ordinal or interval). When both variables are
metric, the assumption is evaluated as part of the
residual analysis in multiple regression.
H d ti it
-
8/12/2019 Assumptions Summer2003
52/119
SW388R7
Data Analysis &
Computers II
Slide 52
Assumption of Homoscedasticity:Evaluating homoscedasticity
Homoscedasticity is evaluated for pairs of variables.
There are both graphical and statistical methods forevaluating homoscedasticity .
The graphical method is called a boxplot.
The statistical method is the Levene statistic which
SPSS computes for the test of homogeneity ofvariances.
Neither of the methods is absolutely definitive.
f H d ti it
-
8/12/2019 Assumptions Summer2003
53/119
SW388R7
Data Analysis &
Computers II
Slide 53
56114220138N =
MARITAL STATUS
NEVER MARRIED
SEPARATED
DIVORCED
WIDOWED
MARRIED
5
4
3
2
1
0
-1
91829105132142256
23421711281696640
2361976863
588789214243
203134
18117116310090
262141
78
Assumption of Homoscedasticity:The boxplot
Each red box shows the middle50% of the cases for the group,indicating how spread out thegroup of scores is.
If the variance acrossthe groups is equal, theheight of the red boxeswill be similar across thegroups.
If the heights of the redboxes are different, theplot suggests that thevariance across groupsis not homogeneous.
The married group ismore spread out thanthe other groups,suggesting unequalvariance.
S 388R7 A i f H d ti it
-
8/12/2019 Assumptions Summer2003
54/119
SW388R7
Data Analysis &
Computers II
Slide 54
Test of Homogeneity of Variances
RS HIGHEST DEGREE
5.239 4 262 .000
Levene
Statistic df1 df2 Sig.
Assumption of Homoscedasticity:Levene test of the homogeneity of variance
The null hypothesis for the test of homogeneity ofvariance states that the variance of the dependentvariable is equal across groups defined by theindependent variable, i.e., the variance is homogeneous.
Since the probability associated with the Levene Statistic(
-
8/12/2019 Assumptions Summer2003
55/119
SW388R7
Data Analysis &
Computers II
Slide 55
Assumption of Homoscedasticity:Transformations
When the assumption of homoscedasticity is notsupported, we can transform the dependent variablevariable and test it for homoscedasticity . If thetransformed variable demonstrateshomoscedasticity, we can substitute it in our
analysis.
We use the sample three common transformationsthat we used for normality: the logarithmictransformation, the square root transformation, and
the inverse transformation.
All of these change the measuring scale on thehorizontal axis of a histogram to produce atransformed variable that is mathematically
equivalent to the original variable.
SW388R7 A ti f H d ti it
-
8/12/2019 Assumptions Summer2003
56/119
SW388R7
Data Analysis &
Computers II
Slide 56
Assumption of Homoscedasticity:When transformations do not work
When none of the transformations results in
homoscedasticity for the variables in the
relationship, including that variable in the analysis
will reduce our effectiveness at identifying statistical
relationships, i.e. we lose power.
SW388R7 A ti f H d ti it
-
8/12/2019 Assumptions Summer2003
57/119
SW388R7
Data Analysis &
Computers II
Slide 57
Assumption of Homoscedasticity:Request a boxplot
The boxplot provides a visualimage of the distribution of the
dependent variable for thegroups defined by theindependent variable.
To request a boxplot, choosethe BoxPlotcommand fromthe Graphsmenu.
Suppose we want totest for homogeneity ofvariance: whether thevariance in "highestacademic degree" ishomogeneous for thecategories of "maritalstatus."
SW388R7 A ti f Homoscedasticit
-
8/12/2019 Assumptions Summer2003
58/119
SW388R7
Data Analysis &
Computers II
Slide 58
Assumption of Homoscedasticity:Specify the type of boxplot
First, click on the Simplestyle of boxplot to highlightit with a rectangle aroundthe thumbnail drawing.
Second, click on the Definebutton to specify thevariables to be plotted.
SW388R7 A ti f Homoscedasticity
-
8/12/2019 Assumptions Summer2003
59/119
SW388R7
Data Analysis &
Computers II
Slide 59
Assumption of Homoscedasticity:Specify the dependent variable
First, click on thedependent variableto highlight it.
Second, click on the rightarrow button to move thedependent variable to theVariabletext box.
SW388R7 A ti f Homoscedasticity
-
8/12/2019 Assumptions Summer2003
60/119
SW388R7
Data Analysis &
Computers II
Slide 60
Assumption of Homoscedasticity:Specify the independent variable
First, click on theindependentvariable to highlightit.
Second, click on the rightarrow button to move theindependent variable to the
Category Axistext box.
SW388R7 A ti f Homoscedasticity
-
8/12/2019 Assumptions Summer2003
61/119
SW388R7
Data Analysis &
Computers II
Slide 61
Assumption of Homoscedasticity:Complete the request for the boxplot
To complete therequest for theboxplot, click onthe OK button.
-
8/12/2019 Assumptions Summer2003
62/119
SW388R7 Assumption of Homoscedasticity :
-
8/12/2019 Assumptions Summer2003
63/119
SW388R7
Data Analysis &
Computers II
Slide 63
Assumption of Homoscedasticity:Request the test for homogeneity of variance
To compute the Levene test for
homogeneity of variance,select the Compare Means |One-Way ANOVAcommandfrom theAnalyzemenu.
-
8/12/2019 Assumptions Summer2003
64/119
SW388R7 Assumption of Homoscedasticity :
-
8/12/2019 Assumptions Summer2003
65/119
Data Analysis &
Computers II
Slide 65
Assumption of Homoscedasticity:Specify the dependent variable
First, click on thedependent variableto highlight it.
Second, click on the rightarrow button to move thedependent variable to theDependent Listtext box.
SW388R7 Assumption of Homoscedasticity :
-
8/12/2019 Assumptions Summer2003
66/119
Data Analysis &
Computers II
Slide 66
Assumption of Homoscedasticity:The homogeneity of variance test is an option
Click on the Optionsbutton to open the optionsdialog box.
-
8/12/2019 Assumptions Summer2003
67/119
SW388R7 Assumption of Homoscedasticity :
-
8/12/2019 Assumptions Summer2003
68/119
Data Analysis &
Computers II
Slide 68
Assumption of Homoscedasticity:Complete the request for output
Click on the OK button tocomplete the request forthe homogeneity ofvariance test through theone-way anova procedure.
SW388R7 Assumption of Homoscedasticity :
-
8/12/2019 Assumptions Summer2003
69/119
Data Analysis &
Computers II
Slide 69
Test of Homogeneity of Variances
RS HIGHEST DEGREE
5.239 4 262 .000
Levene
Statistic df1 df2 Sig.
Assumption of Homoscedasticity:Interpreting the homogeneity of variance test
The null hypothesis for the test of homogeneity ofvariance states that the variance of the dependentvariable is equal across groups defined by theindependent variable, i.e., the variance is homogeneous.
Since the probability associated with the Levene Statistic(
-
8/12/2019 Assumptions Summer2003
70/119
Data Analysis &
Computers II
Slide 70
Using scripts
The process of evaluating assumptions requires
numerous SPSS procedures and outputs that are time
consuming to produce.
These procedures can be automated by creating an
SPSS script. A script is a program that executes a
sequence of SPSS commands.
Though writing scripts is not part of this course, we
can take advantage of scripts that I use to reduce
the burdensome tasks of evaluating assumptions .
SW388R7
-
8/12/2019 Assumptions Summer2003
71/119
Data Analysis &
Computers II
Slide 71
Using a script for evaluating assumptions
The script EvaluatingAssumptionsAndMissingData.exe
will produce all of the output we have used for
evaluating assumptions.
Navigate to the link SPSS Scripts and Syntax on the
course web page.
Download the script file EvaluatingAssumptionsAnd
MissingData.exe to your computer and install it,
following the directions on the web page.
-
8/12/2019 Assumptions Summer2003
72/119
SW388R7
D t A l i &
I k h i i SPSS
-
8/12/2019 Assumptions Summer2003
73/119
Data Analysis &
Computers II
Slide 73
Invoke the script in SPSS
To invoke the script, selectthe Run Script commandin the Utilitiesmenu.
SW388R7
D t A l i &
S l h i
-
8/12/2019 Assumptions Summer2003
74/119
Data Analysis &
Computers II
Slide 74
Select the script
First, navigate to the folder where you put the script.If you followed the directions, you will have a file withan ".SBS" extension in the C:\SW388R7 folder.
If you only see a file with an .EXE extension in thefolder, you should double click on that file to extractthe script file to the C:\SW388R7 folder.
Third, click onRunbutton tostart the script.
Second, click on thescript name to highlight
it.
SW388R7
Data Analysis &
Th i t di l
-
8/12/2019 Assumptions Summer2003
75/119
Data Analysis &
Computers II
Slide 75
The script dialog
The script dialog box acts
similarly to SPSS dialogboxes. You select thevariables to include in theanalysis and choose optionsfor the output.
SW388R7
Data Analysis &
C l t th ifi ti 1
-
8/12/2019 Assumptions Summer2003
76/119
Data Analysis &
Computers II
Slide 76
Complete the specifications - 1
Move the the dependent and
independent variables from the list ofvariables to the list boxes. Metricand nonmetric variables are movedto separate lists so the computerknows how you want them treated.
You must also indicate the levelof measurement for thedependent variable. By defaultthe metric option button is
marked.
SW388R7
Data Analysis &
C l t th ifi ti 2
-
8/12/2019 Assumptions Summer2003
77/119
Data Analysis &
Computers II
Slide 77
Complete the specifications - 2
Mark the optionbutton for the typeof output you wantthe script tocompute.
Click on the OKbutton to producethe output.
Select thetransformations to be tested.
SW388R7
Data Analysis &
Th i t fi i h
-
8/12/2019 Assumptions Summer2003
78/119
Data Analysis &
Computers II
Slide 78
The script finishes
If your SPSS output viewer isopen, you will see the outputproduced in that window.
Since it may take a while toproduce the output, andsince there are times whenit appears that nothing ishappening, there is an alertto tell you when the script isfinished.
Unless you are absolutely
sure something has gonewrong, let the script rununtil you see this alert.
When you see this alert,click on the OK button.
SW388R7
Data Analysis &
O t t f th i t 1
-
8/12/2019 Assumptions Summer2003
79/119
Data Analysis &
Computers II
Slide 79
Output from the script - 1
The script will produce lotsof output. Additionaldescriptive material in the
titles should help linkspecific outputs to specifictasks.
Scroll through the script tolocate the outputs neededto answer the question.
SW388R7
Data Analysis &
Cl i g th i t di l g b
-
8/12/2019 Assumptions Summer2003
80/119
Data Analysis &
Computers II
Slide 80
Closing the script dialog box
The script dialog box doesnot close automaticallybecause we often want torun another test right away.There are two methods forclosing the dialog box.
Click on the Cancelbutton to close thescript.
Click on theXclose box to closethe script.
SW388R7
Data Analysis &
Problem 1
-
8/12/2019 Assumptions Summer2003
81/119
Data Analysis &
Computers II
Slide 81
Problem 1
In the dataset GSS2000R, is the following statement true, false, or an
incorrect application of a statistic? Use a level of significance of 0.01
for evaluating missing data and assumptions.
In pre-screening the data for use in a multiple regression of the
dependent variable "total hours spent on the Internet" [netime] withthe independent variables "age" [age], "sex" [sex], and "income"
[rincom98], the evaluation of the assumptions of normality, linearity,
and homogeneity of variance did not indicate any need for a caution to
be added to the interpretation of the analysis.
1. True
2. True with caution
3. False
4. Inappropriate application of a statistic
SW388R7
Data Analysis &
Level of measurement
-
8/12/2019 Assumptions Summer2003
82/119
Data Analysis &
Computers II
Slide 82
Level of measurement
9. In the dataset GSS2000R, is the following statement true, false, oran incorrect application of a statistic? Use a level of significance of
0.01 for evaluating missing data and assumptions.
In pre-screening the data for use in a multiple regression of the
dependent variable "total hours spent on the Internet" [netime] withthe independent variables "age" [age], "sex" [sex], and "income"
[rincom98], the evaluation of the assumptions of normality, linearity,
and homogeneity of variance did not indicate any need for a caution to
be added to the interpretation of the analysis.
Since we are pre-screening
for a multiple regressionproblem, we should makesure we satisfy the level ofmeasurement beforeproceeding.
"Total hours spent on the Internet"[netime] is interval, satisfying the metriclevel of measurement requirement forthe dependent variable.
"Age" [age] and "highest year of school completed" [educ] are interval,satisfying the metric or dichotomous level of measurement requirement for
independent variables.
"Sex" [sex] is dichotomous, satisfying the metric or dichotomous level ofmeasurement requirement for independent variables.
"Income" [rincom98] is ordinal, satisfying the metric or dichotomous level ofmeasurement requirement for independent variables. Since some dataanalysts do not agree with this convention of treating an ordinal variable asmetric, a note of caution should be included in our interpretation.
-
8/12/2019 Assumptions Summer2003
83/119
SW388R7
Data Analysis &
Run the script to test normality 2
-
8/12/2019 Assumptions Summer2003
84/119
y
Computers II
Slide 84
Run the script to test normality - 2
First, navigate to theSW388R7 folder on yourcomputer.
Third, click onthe Runbutton toopen the script.
Second, click on the script name to select it:EvaluatingAssumptionsAndMissingData.SBS
SW388R7
Data Analysis &
Run the script to test normality 3
-
8/12/2019 Assumptions Summer2003
85/119
y
Computers II
Slide 85
Run the script to test normality - 3
First, move the variables to the
list boxes based on the role thatthe variable plays in the analysisand its level of measurement.
Third, mark the checkboxesfor the transformations thatwe want to test in evaluatingthe assumption.
Second, click on the Normalityoptionbutton to request that SPSS producethe output needed to evaluate theassumption of normality.
Fourth, click onthe OK button toproduce the output.
SW388R7
Data Analysis &
Normality of the dependent variable
-
8/12/2019 Assumptions Summer2003
86/119
Computers II
Slide 86
Descriptives
10.7312 1.59183
7.5697
13.8927
8.2949
5.5000
235.655
15.35106
.20
102.00
101.80
10.2000
3.532 .250
15.614 .495
Mean
Lower Bound
Upper Bound
95% Confidence
Interval for Mean
5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis
TOTAL TIME SPENT
ON THE INTERNET
Statistic Std. Error
Normality of the dependent variable
The dependent variable "total hours spent onthe Internet" [netime] did not satisfy thecriteria for a normal distribution. Both theskewness (3.532) and kurtosis (15.614) felloutside the range from -1.0 to +1.0.
SW388R7
Data Analysis &
Normality of transformed dependent variable
-
8/12/2019 Assumptions Summer2003
87/119
Computers II
Slide 87
Normality of transformed dependent variable
Since "total hours spent on the Internet"[netime] did not satisfy the criteria fornormality, we examine the skewness andkurtosis of each of the transformations tosee if any of them satisfy the criteria.
The "log of total hours spent on the Internet[LGNETIME=LG10(NETIME)]" satisfied the criteria for a
normal distribution. The skewness of the distribution(-0.150) was between -1.0 and +1.0 and the kurtosisof the distribution (0.127) was between -1.0 and +1.0.
The "log of total hours spent on the Internet[LGNETIME=LG10(NETIME)]" was substituted for "totalhours spent on the Internet" [netime] in the analysis.
-
8/12/2019 Assumptions Summer2003
88/119
SW388R7
Data Analysis &
Normality of the independent variables - 2
-
8/12/2019 Assumptions Summer2003
89/119
Computers II
Slide 89
Descriptives
13.35 .419
12.52
14.18
13.54
15.00
29.535
5.435
1
23
22
8.00
-.686 .187
-.253 .373
Mean
Lower Bound
Upper Bound
95% Confidence
Interval for Mean
5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis
RESPONDENTS INCOME
Statistic Std. Error
Normality of the independent variables - 2
The independent variable "income"[rincom98] satisfied the criteria for a normaldistribution. The skewness of the distribution(-0.686) was between -1.0 and +1.0 and thekurtosis of the distribution (-0.253) wasbetween -1.0 and +1.0.
SW388R7
Data Analysis &
C t II Run the script to test linearity - 1
-
8/12/2019 Assumptions Summer2003
90/119
Computers II
Slide 90
Run the script to test linearity - 1
If the script was not closed after
it was used for normality, we cantake advantage of thespecifications already entered. Ifthe script was closed, re-open itas you would for normality.
First, click on the Linearityoptionbutton to request that SPSS producethe output needed to evaluate theassumption of linearity.
When the linearity optionis selected, a default set oftransformations to test ismarked.
SW388R7
Data Analysis &
C t II Run the script to test linearity - 2
-
8/12/2019 Assumptions Summer2003
91/119
Computers II
Slide 91
Run the script to test linearity - 2
Click on the OK
button to producethe output.
Since we have already decided to use thelog of the dependent variable to satisfynormality, that is the form of the
dependent variable we want to evaluatewith the independent variables. Mark thischeckbox for the dependent variable andclear the others.
SW388R7
Data Analysis &
Comp ters II Linearity test with age of respondent
-
8/12/2019 Assumptions Summer2003
92/119
Computers II
Slide 92
Correlations
1 .074 .119 .096 .164
. .483 .257 .362 .116
93 93 93 93 93
.074 1 .979** .995** .916**
.483 . .000 .000 .000
93 270 270 270 270
.119 .979** 1 .994** .978**
.257 .000 . .000 .000
93 270 270 270 270
.096 .995** .994** 1 .951**
.362 .000 .000 . .00093 270 270 270 270
.164 .916** .978** .951** 1
.116 .000 .000 .000 .
93 270 270 270 270
Pearson Correlation
Sig. (2-tailed)
N
Pearson CorrelationSig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Logarithm of NETIME
[LG10(NETIME)]
AGE OF RESPONDENT
Logarithm of AGE
[LG10(AGE)]
Square Root of AGE
[SQRT(AGE)]
Inverse of AGE [-1/(AGE)]
Logarithm
of NETIME
[LG10(NE
TIME)]
AGE OF
RESPON
DENT
Logarithm of
AGE
[LG10(AGE)]
Square Root
of AGE
[SQRT(AGE)]
Inverse of
AGE
[-1/(AGE)]
Correlation is significant at the 0.01 level (2-tailed).**.
Linearity test with age of respondent
The assessment of the linearrelationship between "log of totalhours spent on the Internet[LGNETIME=LG10(NETIME)]" and"age" [age] indicated that therelationship was weak, rather thannonlinear. The statistical probabilitiesassociated with the correlationcoefficients measuring therelationship with the untransformedindependent variable (r=0.074,p=0.483), the logarithmictransformation (r=0.119, p=0.257),the square root transformation(r=0.096, p=0.362), and the inversetransformation (r=0.164, p=0.116),
were all greater than the level ofsignificance for testing assumptions(0.01).
There was no evidence that theassumption of linearity was violated.
SW388R7
Data Analysis &
Computers II Linearity test with respondents income
-
8/12/2019 Assumptions Summer2003
93/119
Computers II
Slide 93
Correlations
1 -.053 .063 .060 .073
. .658 .600 .617 .540
93 72 72 72 72-.053 1 -.922** -.985** -.602**
.658 . .000 .000 .000
72 168 168 168 168
.063 -.922** 1 .974** .848**
.600 .000 . .000 .000
72 168 168 168 168
.060 -.985** .974** 1 .714**
.617 .000 .000 . .000
72 168 168 168 168
.073 -.602** .848** .714** 1
.540 .000 .000 .000 .
72 168 168 168 168
Pearson Correlation
Sig. (2-tailed)
NPearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Logarithm of NETIME
[LG10(NETIME)]
RESPONDENTS INCOME
Logarithm of Reflected
Values of RINCOM98
[LG10( 24-RINCOM98)]
Square Root of Reflected
Values of RINCOM98
[SQRT( 24-RINCOM98)]
Inverse of Reflected
Values of RINCOM98 [-1/(
24-RINCOM98)]
Logarithm
of NETIME
[LG10(NE
TIME)]
RESPONDEN
TS INCOME
Logarithmof Reflected
Values of
RINCOM98
[LG10(
24-RINCOM
98)]
Square Rootof Reflected
Values of
RINCOM98
[SQRT(
24-RINCOM
98)]
Inverse ofReflected
Values of
RINCOM9
8 [-1/(
24-RINC
OM98)]
Correlation is significant at the 0.01 level (2-tail ed).**.
Linearity test with respondent s income
The assessment of the linearrelationship between "log of total hoursspent on the Internet[LGNETIME=LG10(NETIME)]" and"income" [rincom98] indicated that therelationship was weak, rather thannonlinear. The statistical probabilities
associated with the correlationcoefficients measuring the relationshipwith the untransformed independentvariable (r=-0.053, p=0.658), thelogarithmic transformation (r=0.063,p=0.600), the square roottransformation (r=0.060, p=0.617),and the inverse transformation(r=0.073, p=0.540), were all greaterthan the level of significance for testingassumptions (0.01).
There was no evidence that theassumption of linearity was violated.
SW388R7
Data Analysis &
Computers II
Run the script to testh i f i 1
-
8/12/2019 Assumptions Summer2003
94/119
Computers II
Slide 94homogeneity of variance - 1
First, click on the Homogeneity ofvarianceoption button to request thatSPSS produce the output needed toevaluate the assumption ofhomogeneity.
When the homogeneity ofvariance option is selected, adefault set of transformationsto test is marked.
If the script was not closed afterit was used for normality, we cantake advantage of thespecifications already entered. Ifthe script was closed, re-open itas you would for normality.
SW388R7
Data Analysis &
Computers II
Run the script to testh i f i 2
-
8/12/2019 Assumptions Summer2003
95/119
Computers II
Slide 95homogeneity of variance - 2
In this problem, we havealready decided to use the logtransformation for thedependent variable, so weonly need test it. Next, clearall of the transformationcheckboxes except forLogarithmic.
Finally, click onthe OK button toproduce the output.
SW388R7
Data Analysis &
Computers II Levene test of homogeneity of variance
-
8/12/2019 Assumptions Summer2003
96/119
Computers II
Slide 96
Test of Homogeneity of Variances
Logarithm of NETIME [LG10(NETIME)]
.166 1 91 .685
Levene
Statistic df1 df2 Sig.
Levene test of homogeneity of variance
Based on the Levene Test, the variance in "log of totalhours spent on the Internet[LGNETIME=LG10(NETIME)]" was homogeneous for thecategories of "sex" [sex]. The probability associatedwith the Levene statistic (0.166) was p=0.685, greater
than the level of significance for testing assumptions(0.01). The null hypthesis that the group varianceswere equal was not rejected.
The homogeneity of variance assumption was satisfied.
SW388R7
Data Analysis &
Computers II Answer 1
-
8/12/2019 Assumptions Summer2003
97/119
Computers II
Slide 97
Answer 1
In pre-screening the data for use in a multiple regression of the
dependent variable "total hours spent on the Internet" [netime]with the independent variables "age" [age], "sex" [sex], and"income" [rincom98], the evaluation of the assumptions ofnormality, linearity, and homogeneity of variance did notindicate any need for a caution to be added to theinterpretation of the analysis.
1. True
2. True with caution
3. False
4. Inappropriate application of a statistic
The logarithmic transformation of the dependent variable[LGNETIME=LG10(NETIME)] solved the only problem withnormality that we encountered. In that form, the relationshipwith the metric dependent variables was weak, but there was noevidence of nonlinearity. The variance of log transform of thedependent variable was homogeneous for the categories of the
nonmetric variable sex.
No cautions were needed because of a violation of assumptions.A caution was needed because respondents income was ordinallevel.
The answer to the problem is true with caution.
-
8/12/2019 Assumptions Summer2003
98/119
-
8/12/2019 Assumptions Summer2003
99/119
SW388R7
Data Analysis &
Computers II Run the script to test normality - 1
-
8/12/2019 Assumptions Summer2003
100/119
C p
Slide 100
Run the script to test normality 1
To run the script to testassumptions, choose theRun Scriptcommand fromthe Utilities menu.
SW388R7
Data Analysis &
Computers II Run the script to test normality - 2
-
8/12/2019 Assumptions Summer2003
101/119
p
Slide 101
Run the script to test normality 2
First, navigate to theSW388R7 folder on yourcomputer.
Third, click onthe Runbutton toopen the script.
Second, click on the script name to select it:EvaluatingAssumptionsAndMissingData.SBS
-
8/12/2019 Assumptions Summer2003
102/119
-
8/12/2019 Assumptions Summer2003
103/119
SW388R7
Data Analysis &
Computers II Normality of the first independent variables
-
8/12/2019 Assumptions Summer2003
104/119
Slide 104
Descriptives
1.4944 .09456
1.3081
1.6808
1.4365
1.4000
1.9581.39929
-1.14
13.39
14.53
1.8000
2.885 .164
22.665 .327
Mean
Lower Bound
Upper Bound
95% Confidence
Interval for Mean
5% Trimmed Mean
Median
VarianceStd. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis
Population growth rate
Statistic Std. Error
y p
The independent variable "population growthrate" [pgrowth] did not satisfy the criteria fora normal distribution. Both the skewness(2.885) and kurtosis (22.665) fell outside therange from -1.0 to +1.0.
SW388R7
Data Analysis &
Computers II
Normality of transformed independentvariable
-
8/12/2019 Assumptions Summer2003
105/119
Slide 105variable
Neither the logarithmic(skew=-0.218,kurtosis=1.277), the
square root (skew=0.873,kurtosis=5.273), nor theinverse transformation(skew=-1.836,kurtosis=5.763) inducednormality in the variable"population growth rate"[pgrowth].
A caution was added tothe findings.
-
8/12/2019 Assumptions Summer2003
106/119
SW388R7
Data Analysis &
Computers II
Normality of transformed independentvariable
-
8/12/2019 Assumptions Summer2003
107/119
Slide 107variable
Since the distribution was skewed to theleft, it was necessary to reflect, or reversecode, the values for the variable beforecomputing the transformation.
The "square root of percent of the total population who was literate(using reflected values) [SQLITERA=SQRT(101-LITERACY)]" satisfiedthe criteria for a normal distribution. The skewness of the distribution(0.567) was between -1.0 and +1.0 and the kurtosis of the distribution(-0.964) was between -1.0 and +1.0. The "square root of percent of thetotal population who was literate (using reflected values)[SQLITERA=SQRT(101-LITERACY)]" was substituted for "percent of thetotal population who was literate" [literacy] in the analysis.
SW388R7
Data Analysis &
Computers II Normality of the third independent variables
-
8/12/2019 Assumptions Summer2003
108/119
Slide 108
Descriptives
8554.43 580.523
7410.27
9698.59
7818.67
5000.00
7.4E+078590.954
510
36400
35890
11200.00
1.207 .164
.475 .327
Mean
Lower Bound
Upper Bound
95% Confidence
Interval for Mean
5% Trimmed Mean
Median
VarianceStd. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis
Per capita GDP
Statistic Std. Error
y p
The independent variable "per capita GDP"[gdp] did not satisfy the criteria for a normaldistribution. The kurtosis of the distribution(0.475) was between -1.0 and +1.0, but theskewness of the distribution (1.207) felloutside the range from -1.0 to +1.0.
SW388R7
Data Analysis &
Computers II
Normality of transformed independentvariable
-
8/12/2019 Assumptions Summer2003
109/119
Slide 109variable
The "square root of percapita GDP[SQGDP=SQRT(GDP)]"satisfied the criteria for a
normal distribution. Theskewness of thedistribution (0.614) wasbetween -1.0 and +1.0and the kurtosis of thedistribution (-0.773) wasbetween -1.0 and +1.0.
The "square root of percapita GDP[SQGDP=SQRT(GDP)]"was substituted for "percapita GDP" [gdp] in theanalysis.
SW388R7
Data Analysis &
Computers II Run the script to test linearity - 1
-
8/12/2019 Assumptions Summer2003
110/119
Slide 110
p y
If the script was not closed after
it was used for normality, we cantake advantage of thespecifications already entered. Ifthe script was closed, re-open itas you would for normality.
First, click on the Linearityoptionbutton to request that SPSS producethe output needed to evaluate theassumption of linearity. When the linearity option
is selected, a default set oftransformations to test ismarked.
Click on the OKbutton to producethe output.
SW388R7
Data Analysis &
Computers II Linearity test with population growth rate
-
8/12/2019 Assumptions Summer2003
111/119
Slide 111
Correlations
1 -.262** -.314** -.301** -.282**
. .000 .000 .000 .000
219 219 219 219 219
-.262** 1 .930** .979** .801**.000 . .000 .000 .000
219 219 219 219 219
-.314** .930** 1 .985** .956**
.000 .000 . .000 .000
219 219 219 219 219
-.301** .979** .985** 1 .897**
.000 .000 .000 . .000219 219 219 219 219
-.282** .801** .956** .897** 1
.000 .000 .000 .000 .
219 219 219 219 219
Pearson Correlation
Sig. (2-tailed)
N
Pearson CorrelationSig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Life expectancy at birth -
total population
Population growth rate
Logarithm of PGROWTH
[LG10( 2.14+PGROWTH)]
Square Root of
PGROWTH [SQRT(
2.14+PGROWTH)]
Inverse of PGROWTH [-1/(
2.14+PGROWTH)]
Life
expectancy
at birth - total
population
Population
growth rate
Logarithm of
PGROWTH
[LG10(
2.14+PGRO
WTH)]
Square Root
of PGROWTH
[SQRT(
2.14+PGROW
TH)]
Inverse of
PGROWT
H [-1/(
2.14+PG
ROWTH)]
Correlation is significant at the 0.01 level (2-tailed).**.
The assessment of the linearityof the relationship between "lifeexpectancy at birth" [lifeexp]and "population growth rate"[pgrowth] indicated that therelationship could be considered
linear because the probabilityassociated with the correlationcoefficient for the relationship(r=-0.262) was statisticallysignficant (p
-
8/12/2019 Assumptions Summer2003
112/119
Slide 112
Correlations
1 .724** -.670** -.720** -.467**
. .000 .000 .000 .000
219 203 203 203 203
.724** 1 -.895** -.978** -.594**
.000 . .000 .000 .000
203 203 203 203 203
-.670** -.895** 1 .966** .857**
.000 .000 . .000 .000
203 203 203 203 203
-.720** -.978** .966** 1 .717**
.000 .000 .000 . .000
203 203 203 203 203
-.467** -.594** .857** .717** 1
.000 .000 .000 .000 .
203 203 203 203 203
Pearson Correlation
Sig. (2-tailed)
NPearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Life expectancy at birth -
total population
Percent l iterate - total
population
Logarithm of Reflected
Values of LITERACY
[LG10( 101-LITERACY)]
Square Root of Reflected
Values of LITERACY
[SQRT( 101-LITERACY)]
Inverse of Reflected
Values of LITERACY [-1/(
101-LITERACY)]
Life
expectancy
at birth - total
population
Percent
literate - total
population
Logarithmof Reflected
Values of
LITERACY
[LG10(
101-LITERA
CY)]
Square Rootof Reflected
Values of
LITERACY
[SQRT(
101-LITERA
CY)]
Inverse ofReflected
Values of
LITERAC
Y [-1/(
101-LITE
RACY)]
Correlation is significant at the 0.01 level (2-tai led).**.
The transformation "squareroot of percent of the totalpopulation who was literate(using reflected values)[SQLITERA=SQRT(101-LITERACY)]" was incorporatedin the analysis in theevaluation of normality.
Additional transformations forlinearity were not considered.
SW388R7
Data Analysis &
Computers II Linearity test with per capita GDP
-
8/12/2019 Assumptions Summer2003
113/119
Slide 113
Correlations
1 .643** .762** .713** .727**
. .000 .000 .000 .000
219 219 219 219 219
.643** 1 .898** .978** .637**
.000 . .000 .000 .000
219 219 219 219 219
.762** .898** 1 .969** .890**
.000 .000 . .000 .000
219 219 219 219 219
.713** .978** .969** 1 .762**
.000 .000 .000 . .000
219 219 219 219 219
.727** .637** .890** .762** 1
.000 .000 .000 .000 .
219 219 219 219 219
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Life expectancy at birth -
total population
Per capita GDP
Logarithm of GDP
[LG10(GDP)]
Square Root of GDP
[SQRT(GDP)]
Inverse of GDP [-1/(GDP)]
Life
expectancy
at birth - total
population
Per capi ta
GDP
Logarithm of
GDP
[LG10(GDP)]
Square Root
of GDP
[SQRT(GDP)]
Inverse of
GDP
[-1/(GDP)]
Correlation is significant at the 0.01 level (2-tailed).**.
The transformation "squareroot of per capita GDP[SQGDP=SQRT(GDP)]" wasincorporated in the analysisin the evaluation ofnormality. Additional
transformations for linearitywere not considered.
SW388R7
Data Analysis &
Computers II
Run the script to testhomogeneity of variance - 1
-
8/12/2019 Assumptions Summer2003
114/119
Slide 114homogeneity of variance 1
There were no nonmetricvariables in this analysis, so thetest of homogeneity of variancewas not conducted.
SW388R7
Data Analysis &
Computers II Answer 2
-
8/12/2019 Assumptions Summer2003
115/119
Slide 115
In pre-screening the data for use in a multiple regression of the
dependent variable "life expectancy at birth" [lifeexp] with theindependent variables "population growth rate" [pgrowth], "percent ofthe total population who was literate" [literacy], and "per capita GDP"[gdp], the evaluation of the assumptions of normality, linearity, andhomogeneity of variance did not indicate any need for a caution to beadded to the interpretation of the analysis.
1. True2. True with caution
3. False
4. Inappropriate application of a statistic
Two transformations were substituted to satisfy the assumption ofnormality: the "square root of percent of the total population whowas literate (using reflected values) [SQLITERA=SQRT(101-LITERACY)]" and the "square root of per capita GDP[SQGDP=SQRT(GDP)]" was substituted for "per capita GDP" [gdp] inthe analysis.
However, none of the transformations induced normality in thevariable "population growth rate" [pgrowth]. A caution was added to
the findings.
The answer to the problem is false. A caution was added because"Population growth rate" [pgrowth] did not satisfy the assumption ofnormality and none of the transformations were successful ininducing normality.
SW388R7
Data Analysis &
Computers II
Steps in evaluating assumptions:level of measurement
-
8/12/2019 Assumptions Summer2003
116/119
Slide 116level of measurement
The following is a guide to the decision process for answeringproblems about assumptions for multiple regression:
Incorrect application
of a statistic
Yes
NoIs the dependent
variable metric and theindependent variablesmetric or dichotomous?
SW388R7
Data Analysis &
Computers II
Steps in evaluating assumptions:assumption of normality for metric variable
-
8/12/2019 Assumptions Summer2003
117/119
Slide 117assumption of normality for metric variable
Does one or more of thetransformations satisfythe criteria for a normaldistribution?
No
Yes
Assumptionsatisfied, useuntransformedvariable in analysis
No
YesDoes the dependentvariable satisfy thecriteria for a normaldistribution?
Assumptionsatisfied, usetransformedvariable withsmallest skew
Assumption notsatisfied, useuntransformedvariable in analysis
Add caution tointerpretation
SW388R7
Data Analysis &
Computers II
Steps in evaluating assumptions:assumption of linearity for metric variables
-
8/12/2019 Assumptions Summer2003
118/119
Slide 118assumption of linearity for metric variables
Yes
Probability of correlation (r) forrelationship between IV andDV
-
8/12/2019 Assumptions Summer2003
119/119
Slide 119g y
Yes
Probability of Levenestatistic