assumptions summer2003

Author: ian-pratama

Post on 03-Jun-2018

213 views

Category:

Documents

Embed Size (px)

TRANSCRIPT

• 8/12/2019 Assumptions Summer2003

1/119

SW388R7

Data Analysis &

Computers II

Slide 1

Assumptions of multiple regression

Assumption of normality

Assumption of linearity

Assumption of homoscedasticity

Script for testing assumptions

Practice problems

• 8/12/2019 Assumptions Summer2003

2/119

SW388R7

Data Analysis &

Computers II

Slide 2

Assumptions of Normality, Linearity, andHomoscedasticity

Multiple regression assumes that the variables in theanalysis satisfy the assumptions of normality,linearity, and homoscedasticity. (There is also anassumption of independence of errors but thatcannot be evaluated until the regression is run.)

There are two general strategies for checkingconformity to assumptions: pre-analysis and post-analysis. In pre-analysis, the variables are checkedprior to running the regression. In post-analysis, the

assumptions are evaluated by looking at the patternof residuals (errors or variability) that the regressionwas unable to predict accurately.

The text recommends pre-analysis, the strategy we

will follow.

• 8/12/2019 Assumptions Summer2003

3/119

SW388R7

Data Analysis &

Computers II

Slide 3

Assumption of Normality

The assumption of normality prescribes that thedistribution of cases fit the pattern of a normalcurve.

It is evaluated for all metric variables included in the

analysis, independent variables as well as thedependent variable.

With multivariate statistics, the assumption is thatthe combination of variables follows a multivariate

normal distribution. Since there is not a direct test for multivariate

normality, we generally test each variableindividually and assume that they are multivariatenormal if they are individually normal, though this is

not necessarily the case.

• 8/12/2019 Assumptions Summer2003

4/119

SW388R7

Data Analysis &

Computers II

Slide 4

Assumption of Normality:Evaluating Normality

There are both graphical and statistical methods forevaluating normality.

Graphical methods include the histogram andnormality plot.

Statistical methods include diagnostic hypothesistests for normality, and a rule of thumb that saysa variable is reasonably close to normal if itsskewness and kurtosis have values between 1.0

and +1.0. None of the methods is absolutely definitive.

We will use the criteria that the skewness andkurtosis of the distribution both fall between -1.0and +1.0.

• 8/12/2019 Assumptions Summer2003

5/119

SW388R7

Data Analysis &

Computers II

Slide 5

Assumption of Normality:Histograms and Normality Plots

RS OCCUPATI ONAL PRESTIG E SCORE (1980)

85.0

80.0

75.0

70.0

65.0

60.0

55.0

50.0

45.0

40.0

35.0

30.0

25.0

20.0

15.0

Histogram50

40

30

20

10

0

Std. Dev = 13.94

Mean = 44.2

N = 255.00

TIME SPENT USING E-MAIL

40.035.030.025.020.015.010.05.00.0

Histogram100

80

60

40

20

0

Std. Dev = 6.14

Mean = 3.6

N = 119.00

Normal Q-Q Plot of TIME SPENT USING E-MAIL

Observed Value

50403020100-10

3

2

1

0

-1

-2

Normal Q-Q Plot of RS OCCUPATIONAL PREST

Observed Value

100806040200

3

2

1

0

-1

-2

-3

On the left side of the slide is the histogram and normality plot

for a occupational prestige that could reasonably becharacterized as normal. Time using email, on the right, is notnormally distributed.

• 8/12/2019 Assumptions Summer2003

6/119

SW388R7

Data Analysis &

Computers II

Slide 6

Assumption of Normality:Hypothesis test of normality

Tests of Normality

.121 255 .000 .964 255 .000

RS OCCUPATIONAL

PRESTIGE SCORE(1980)

Statistic df Sig. Statistic df Sig.

Kolmogorov-Smirnova

Shapiro-Wilk

Lilliefors Significance Correctiona.

Tests of Normality

.296 119 .000 .601 119 .000TIME SPENT

USING E-MAIL

Statistic df Sig. Statistic df Sig.

Kolmogorov-Smirnova

Shapiro-Wilk

Lilli efors Significance Correctiona.

The hypothesis test for normality tests the null hypothesis that thevariable is normal, i.e. the actual distribution of the variable fits thepattern we would expect if it is normal. If we fail to reject the nullhypothesis, we conclude that the distribution is normal.

The distribution for both of the variable depicted on the previous slide areassociated with low significance values that lead to rejecting the null

hypothesis and concluding that neither occupational prestige nor timeusing email is normally distributed.

• 8/12/2019 Assumptions Summer2003

7/119

SW388R7

Data Analysis &

Computers II

Slide 7

Assumption of Normality:Skewness, kurtosis, and normality

Using the rule of thumb that a rule of thumb that says a variable isreasonably close to normal if its skewness and kurtosis have valuesbetween 1.0 and +1.0, we would decide that occupationalprestige is normally distributed and time using email is not.

We will use this rule of thumb for normality in our strategy forsolving problems.

• 8/12/2019 Assumptions Summer2003

8/119

SW388R7

Data Analysis &

Computers II

Slide 8

Assumption of Normality:Transformations

When a variable is not normally distributed, we cancreate a transformed variable and test it fornormality. If the transformed variable is normallydistributed, we can substitute it in our analysis.

Three common transformations are: the logarithmictransformation, the square root transformation, andthe inverse transformation.

All of these change the measuring scale on thehorizontal axis of a histogram to produce atransformed variable that is mathematicallyequivalent to the original variable.

• 8/12/2019 Assumptions Summer2003

9/119

SW388R7

Data Analysis &

Computers II

Slide 9

Assumption of Normality:Computing Transformations

We will use SPSS scripts as described below to testassumptions and compute transformations.

For additional details on the mechanics of computingtransformations, see Computing Transformations

• 8/12/2019 Assumptions Summer2003

10/119

SW388R7

Data Analysis &

Computers II

Slide 10

Assumption of Normality:When transformations do not work

When none of the transformations induces normality

in a variable, including that variable in the analysis

will reduce our effectiveness at identifying statistical

relationships, i.e. we lose power.

We do have the option of changing the way the

information in the variable is represented, e.g.

substitute several dichotomous variables for a single

metric variable.

• 8/12/2019 Assumptions Summer2003

11/119

SW388R7

Data Analysis &

Computers II

Slide 11

Assumption of Normality:Computing Explore descriptive statistics

To compute the statisticsneeded for evaluating thenormality of a variable, selectthe Explore command from

• 8/12/2019 Assumptions Summer2003

12/119

SW388R7

Data Analysis &

Computers II

Slide 12

Assumption of Normality:Adding the variable to be evaluated

First, click on thevariable to be includedin the analysis tohighlight it.

Second, click on rightarrow button to movethe highlighted variableto the Dependent List.

• 8/12/2019 Assumptions Summer2003

13/119

SW388R7

Data Analysis &

Computers II

Slide 13

Assumption of Normality:Selecting statistics to be computed

To select the statistics for theoutput, click on theStatisticscommand button.

• 8/12/2019 Assumptions Summer2003

14/119

SW388R7

Data Analysis &

Computers II

Slide 14

Assumption of Normality:Including descriptive statistics

First, click on theDescriptivescheckboxto select it. Clear theother checkboxes.

Second, click on theContinuebutton tocomplete the request forstatistics.

• 8/12/2019 Assumptions Summer2003

15/119

SW388R7

Data Analysis &

Computers II

Slide 15

Assumption of Normality:Selecting charts for the output

To select the diagnostic chartsfor the output, click on thePlotscommand button.

• 8/12/2019 Assumptions Summer2003

16/119

SW388R7

Data Analysis &

Computers II

Slide 16

Assumption of Normality:Including diagnostic plots and statistics

First, click on theNoneoption buttonon the Boxplots panelsince boxplots are notas helpful as othercharts in assessingnormality.

Second, click on theNormality plots with testscheckbox to includenormality plots and thehypothesis tests fornormality.

Third, click on the Histogramcheckbox to include ahistogram in the output. Youmay want to examine thestem-and-leaf plot as well,though I find it less useful.

Finally, click on theContinuebutton tocomplete the request.

• 8/12/2019 Assumptions Summer2003

17/119

SW388R7

Data Analysis &

Computers II

Slide 17

Assumption of Normality:Completing the specifications for the analysis

Click on the OK button tocomplete the specificationsfor the analysis and requestSPSS to produce theoutput.

• 8/12/2019 Assumptions Summer2003

18/119

SW388R7

Data Analysis &

Computers II

Slide 18

TOTAL TIME SPENT ON THE INTERNET

100.0

90.0

80.0

70.0

60.0

50.0

40.0

30.0

20.0

10.0

0.0

Histogram50

40

30

20

10

0

Std. Dev = 15.35

Mean = 10.7

N = 93.00

Assumption of Normality:The histogram

An initial impression of thenormality of the distributioncan be gained by examiningthe histogram.

In this example, the

histogram shows a substantialviolation of normality causedby a extremely large value inthe distribution.

• 8/12/2019 Assumptions Summer2003

19/119

SW388R7

Data Analysis &

Computers II

Slide 19

Normal Q-Q Plot of TOTAL TIME SPENT ON TH

Observed Value

120100806040200-20-40

3

2

1

0

-1

-2

-3

Assumption of Normality:The normality plot

The problem with the normality of thisvariables distribution is reinforced by thenormality plot.

If the variable were normally distributed,

the red dots would fit the green line veryclosely. In this case, the red points in theupper right of the chart indicate thesevere skewing caused by the extremelylarge data values.

• 8/12/2019 Assumptions Summer2003

20/119

SW388R7

Data Analysis &

Computers II

Slide 20

Tests of Normality

.246 93 .000 .606 93 .000TOTAL TIME SPENT

ON THE INTERNET

Statistic df Sig. Statistic df Sig.

Kolmogorov-Smirnova

Shapiro-Wilk

Lilli efors Significance Correctiona.

Assumption of Normality:The test of normality

Since the sample size is larger than 50, we use the Kolmogorov-Smirnovtest. If the sample size were 50 or less, we would use the Shapiro-Wilkstatistic instead.

The null hypothesis for the test of normality states that the actualdistribution of the variable is equal to the expected distribution, i.e., thevariable is normally distributed. Since the probability associated with thetest of normality is < 0.001 is less than or equal to the level of significance

(0.01), we reject the null hypothesis and conclude that total hours spent onthe Internet is not normally distributed. (Note: we report the probability as

• 8/12/2019 Assumptions Summer2003

21/119

SW388R7

Data Analysis &

Computers II

Slide 21

Descriptives

10.7312 1.59183

7.5697

13.8927

8.2949

5.5000235.655

15.35106

.20

102.00

101.80

10.2000

3.532 .250

15.614 .495

Mean

Lower Bound

Upper Bound

95% Confidence

Interval for Mean

5% Trimmed Mean

MedianVariance

Std. Deviation

Minimum

Maximum

Range

Interquartile Range

Skewness

Kurtosis

TOTAL TIME SPEN

ON THE INTERNET

Statistic Std. Error

Assumption of Normality:The rule of thumb for skewness and kurtosis

Using the rule of thumb for evaluating normality with the skewnessand kurtosis statistics, we look at the table of descriptive statistics.

The skewness and kurtosis for the variable both exceed the rule ofthumb criteria of 1.0. The variable is not normally distributed.

• 8/12/2019 Assumptions Summer2003

22/119

SW388R7

Data Analysis &

Computers II

Slide 22

Assumption of Linearity

Linearity means that the amount of change, or rateof change, between scores on two variables isconstant for the entire range of scores for thevariables.

Linearity characterizes the relationship between twometric variables. It is tested for the pairs formed bydependent variable and each metric independentvariable in the analysis.

There are relationships that are not linear. The relationship between learning and time may not be

linear. Learning a new subject shows rapid gains at first,then the pace slows down over time. This is oftenreferred to a a learning curve.

Population growth may not be linear. The pattern oftenshows growth at increasing rates over time.

• 8/12/2019 Assumptions Summer2003

23/119

SW388R7

Data Analysis &

Computers II

Slide 23

Assumption of Linearity:Evaluating linearity

There are both graphical and statistical methods forevaluating linearity.

Graphical methods include the examination of

scatterplots, often overlaid with a trendline. Whilecommonly recommended, this strategy is difficult toimplement.

Statistical methods include diagnostic hypothesistests for linearity, a rule of thumb that says arelationship is linear if the difference between thelinear correlation coefficient (r) and the nonlinearcorrelation coefficient (eta) is small, and examining

patterns of correlation coefficients.

• 8/12/2019 Assumptions Summer2003

24/119

SW388R7

Data Analysis &

Computers II

Slide 24

RESPONDENT'S SOCIOECONOMIC INDEX

100806040200

90

80

70

60

50

40

30

20

10

Assumption of Linearity:Interpreting scatterplots

The advice for interpretinglinearity is often phrased aslooking for a cigar-shapedband, which is very evident inthis plot.

• 8/12/2019 Assumptions Summer2003

25/119

SW388R7

Data Analysis &

Computers II

Slide 25

Gross domestic product / capita

3000020000100000-10000

200

100

0

-100

Assumption of Linearity:Interpreting scatterplots

Sometimes, a scatterplotshows a clearly nonlinearpattern that requirestransformation, like the oneshown in the scatterplot.

• 8/12/2019 Assumptions Summer2003

26/119

SW388R7

Data Analysis &

Computers II

Slide 26

Assumption of Linearity:Scatterplots that are difficult to interpret

AGE OF RESPONDENT

8070605040302010

120

100

80

60

40

20

0

-20

HOURS PER DAY WATCHING TV

1614121086420-2

120

100

80

60

40

20

0

-20

The correlations for both of theserelationships are low.

The linearity of the relationship on the rightcan be improved with a transformation; theplot on the left cannot. However, this is notnecessarily obvious from the scatterplots.

• 8/12/2019 Assumptions Summer2003

27/119

SW388R7

Data Analysis &

Computers II

Slide 27

Correlations

1 .017 .048 .032 .079

. .874 .648 .761 .453

93 93 93 93 93

.017 1 .979** .995** .916**

.874 . .000 .000 .000

93 270 270 270 270

.048 .979** 1 .994** .978**

.648 .000 . .000 .000

93 270 270 270 270

.032 .995** .994** 1 .951**

.761 .000 .000 . .000

93 270 270 270 270

.079 .916** .978** .951** 1

.453 .000 .000 .000 .

93 270 270 270 270

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

TOTAL TIME SPENT ON

THE INTERNET

AGE OF RESPONDENT

Logarithm of AGE

[LG10(AGE)]

Square Root of AGE

[SQRT(AGE)]

Inverse of AGE [-1/(AGE)]

TOTAL TIME

SPENT ON

THE

INTERNET

AGE OF

RESPON

DENT

Logarithm of

AGE

[LG10(AGE)]

Square Root

of AGE

[SQRT(AGE)]

Inverse of

AGE

[-1/(AGE)]

Correlation is significant at the 0.01 level (2-tailed).**.

Assumption of Linearity:Using correlation matrices

Creating a correlation matrixfor the dependent variableand the original andtransformed variations of theindependent variable providesus with a pattern that iseasier to interpret.

The information that we needis in the first column of thematrix which shows thecorrelation and significancefor the dependent variableand all forms of theindependent variable.

• 8/12/2019 Assumptions Summer2003

28/119

SW388R7

Data Analysis &

Computers II

Slide 28

Correlations

1 .017 .048 .032 .079

. .874 .648 .761 .453

93 93 93 93 93

.017 1 .979** .995** .916**

.874 . .000 .000 .000

93 270 270 270 270

.048 .979** 1 .994** .978**

.648 .000 . .000 .000

93 270 270 270 270

.032 .995** .994** 1 .951**

.761 .000 .000 . .000

93 270 270 270 270

.079 .916** .978** .951** 1

.453 .000 .000 .000 .

93 270 270 270 270

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

TOTAL TIME SPENT ON

THE INTERNET

AGE OF RESPONDENT

Logarithm of AGE

[LG10(AGE)]

Square Root of AGE

[SQRT(AGE)]

Inverse of AGE [-1/(AGE)]

TOTAL TIME

SPENT ON

THE

INTERNET

AGE OF

RESPON

DENT

Logarithm of

AGE

[LG10(AGE)]

Square Root

of AGE

[SQRT(AGE)]

Inverse of

AGE

[-1/(AGE)]

Correlation is significant at the 0.01 level (2-tailed).**.

Assumption of Linearity:The pattern of correlations for no relationship

The correlation between thetwo variables is very weakand statistically non-significant. If we viewed thisas a hypothesis test for thesignificance of r, we wouldconclude that there is norelationship between thesevariables.

Moreover, none of significancetests for the correlations withthe transformed dependentvariable are statisticallysignificant. There is norelationship between thesevariables; it is not a problemwith non-linearity.

• 8/12/2019 Assumptions Summer2003

29/119

SW388R7

Data Analysis &

Computers II

Slide 29

Correlations

1 .021 .258** .066 .198**

. .762 .000 .331 .003

219 219 219 219 219

.021 1 .282** .897** .038

.762 . .000 .000 .572

219 219 219 219 219

.258** .282** 1 .600** .606**

.000 .000 . .000 .000

219 219 219 219 219

.066 .897** .600** 1 .164*

.331 .000 .000 . .015

219 219 219 219 219

.198** .038 .606** .164* 1

.003 .572 .000 .015 .

219 219 219 219 219

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Infant mortali ty rate

population

Logarithm of POP

[LG10(POP)]

Square Root of POP

[SQRT(POP)]

Inverse of POP [-1/(POP)]

Infant

mortality rate population

Logarithm of

POP

[LG10(POP)]

Square Root

of POP

[SQRT(POP)]

Inverse of

POP

[-1/(POP)]

Correlation is significant at the 0.01 level (2-tailed).**.

Correlation is significant at the 0.05 level (2-tailed).*.

Assumption of Linearity:Correlation pattern suggesting transformation

The correlation between thetwo variables is very weakand statistically non-significant. If we viewed thisas a hypothesis test for thesignificance of r, we wouldconclude that there is norelationship between thesevariables.

However, the probabilityassociated with the largercorrelation for the logarithmictransformation is statisticallysignificant, suggesting thatthis is a transformation wemight want to use in ouranalysis.

• 8/12/2019 Assumptions Summer2003

30/119

SW388R7

Data Analysis &

Computers II

Slide 30

Assumption of Linearity:Correlation pattern suggesting substitution

Should it happen that the correlation between a

transformed independent variable and the

dependent variable is substantially stronger than the

relationship between the untransformed independent

variable and the dependent variable, thetransformation should be considered even if the

relationship involving the untransformed

independent variable is statistically significant.

A difference of +0.20 or -0.20, or more, would be

considered substantial enough since a change of this

size would alter our interpretation of the

relationship.

• 8/12/2019 Assumptions Summer2003

31/119

SW388R7

Data Analysis &

Computers II

Slide 31

Assumption of Linearity:Transformations

When a relationship is not linear, we can transformone or both variables to achieve a relationship that is

linear.

Three common transformations to induce linearityare: the logarithmic transformation, the square root

transformation, and the inverse transformation.

All of these transformations produce a new variablethat is mathematically equivalent to the original

variable, but expressed in different measurement

units, e.g. logarithmic units instead of decimal units.

• 8/12/2019 Assumptions Summer2003

32/119

SW388R7

Data Analysis &

Computers II

Slide 32

Assumption of Linearity:When transformations do not work

When none of the transformations induces linearity

in a relationship, our statistical analysis will

underestimate the presence and strength of the

relationship, i.e. we lose power.

We do have the option of changing the way the

information in the variables are represented, e.g.

substitute several dichotomous variables for a single

metric variable. This bypasses the assumption oflinearity while still attempting to incorporate the

information about the relationship in the analysis.

• 8/12/2019 Assumptions Summer2003

33/119

SW388R7

Data Analysis &

Computers II

Slide 33

Assumption of Linearity:Creating the scatterplot

Suppose we are interested inthe linearity of therelationship between "hoursper day watching TV" and"total hours spent on theInternet".

The most commonlyrecommended strategy forevaluating linearity is visualexamination of a scatter plot.

To obtain a scatter plotin SPSS, select theScatter command fromthe Graphsmenu.

• 8/12/2019 Assumptions Summer2003

34/119

SW388R7

Data Analysis &

Computers II

Slide 34

Assumption of Linearity:Selecting the type of scatterplot

First, click onthumbnail sketch of asimple scatterplot tohighlight it.

Second, click onthe Define button tospecify the variablesto be included in thescatterplot.

• 8/12/2019 Assumptions Summer2003

35/119

SW388R7

Data Analysis &

Computers II

Slide 35

Assumption of Linearity:Selecting the variables

First, move thedependent variablenetimeto the Y

Axistext box.

Second, move theindependentvariable tvhourstotheX axistextbox.

If a problem statement mentions arelationship between two variableswithout clearly indicating which isthe independent variable and whichis the dependent variable, the firstmentioned variable is taken to thebe independent variable.

Third, click onthe OKbutton tocomplete thespecifications forthe scatterplot.

• 8/12/2019 Assumptions Summer2003

36/119

SW388R7

Data Analysis &

Computers II

Slide 36

Assumption of Linearity:The scatterplot

The scatterplot is produced in

the SPSS output viewer.

The points in a scatterplot areconsidered linear if they forma cigar-shaped elliptical band.

The pattern in this scatterplotis not really clear.

• 8/12/2019 Assumptions Summer2003

37/119

SW388R7

Data Analysis &

Computers II

Slide 37

To try to determine if the relationship is linear,we can add a trendline to the chart.

To add a trendlineto the chart, weneed to open thechart for editing.

To open the chartfor editing, doubleclick on it.

• 8/12/2019 Assumptions Summer2003

38/119

SW388R7

Data Analysis &

Computers II

Slide 38

Assumption of Linearity:The scatterplot in the SPSS Chart Editor

The chart that wedouble clicked on is

opened for editing in theSPSS Chart Editor.

To add the trendline, select the

• 8/12/2019 Assumptions Summer2003

39/119

SW388R7

Data Analysis &

Computers II

Slide 39

Assumption of Linearity:Requesting the fit line

In the Scatterplot Optionsdialog box, we click on theTotalcheckbox in the Fit Linepanel in order to request thetrend line.

Click on the Fit Optionsbutton to request the rcoefficient of determinationas a measure of thestrength of therelationship.

• 8/12/2019 Assumptions Summer2003

40/119

SW388R7

Data Analysis &

Computers II

Slide 40

Assumption of Linearity:Requesting r

First, the Linearregressionthumbnailsketch should behighlighted as the typeof fit line to be added tothe chart.

Second, click on the FitOptions Click on the DisplayR-square in Legendcheckboxto add this item to ouroutput.

Third, click on theContinuebutton tocomplete theoptions request.

• 8/12/2019 Assumptions Summer2003

41/119

• 8/12/2019 Assumptions Summer2003

42/119

SW388R7

Data Analysis &

Computers II

Slide 42

Assumption of Linearity:The fit line and r

The red fit line isadded to the chart.

The value of r(0.0460)suggests thatthe relationship

is weak.

• 8/12/2019 Assumptions Summer2003

43/119

SW388R7

Data Analysis &

Computers II

Slide 43

Assumption of Linearity:Computing the transformations

There are fourtransformations that wecan use to achieve orimprove linearity.

The compute dialogs forthese fourtransformations for

linearity are shown.

• 8/12/2019 Assumptions Summer2003

44/119

SW388R7

Data Analysis &

Computers II

Slide 44

Assumption of Linearity:Creating the scatterplot matrix

To create the scatterplotmatrix, select theScatter command inthe Graphsmenu.

• 8/12/2019 Assumptions Summer2003

45/119

SW388R7

Data Analysis &

Computers II

Slide 45

Assumption of Linearity:Selecting type of scatterplot

First, click on theMatrixthumbnailsketch to indicatewhich type ofscatterplot we want.

Second, click on theDefinebutton to selectthe variables for thescatterplot.

• 8/12/2019 Assumptions Summer2003

46/119

SW388R7

Data Analysis &

Computers II

Slide 46

Assumption of Linearity:Specifications for scatterplot matrix

First, move the dependentvariable, the independent variableand all of the transformations tothe Matrix Variableslist box.

Second, clickon the OKbutton toproduce thescatterplot.

• 8/12/2019 Assumptions Summer2003

47/119

SW388R7

Data Analysis &

Computers II

Slide 47

TOTAL TIME SPENT ON

HOURS PER DAY WATCHI

LGTVHOUR

SQTVHOUR

INTVHOUR

S2TVHOUR

Assumption of Linearity:The scatterplot matrix

The scatterplot matrix shows athumbnail sketch of scatterplotsfor each independent variable ortransformation with thedependent variable. Thescatterplot matrix may suggest

which transformations might beuseful.

• 8/12/2019 Assumptions Summer2003

48/119

SW388R7

Data Analysis &

Computers II

Slide 48

Assumption of Linearity:Creating the correlation matrix

To create the correlationmatrix, select theCorrelate | Bivariatecommand in theAnalyze

• 8/12/2019 Assumptions Summer2003

49/119

SW388R7

Data Analysis &

Computers II

Slide 49

Assumption of Linearity:Specifications for correlation matrix

First, move the dependentvariable, the independent variableand all of the transformations tothe Variableslist box.

Second, click onthe OKbutton toproduce thecorrelation matrix.

• 8/12/2019 Assumptions Summer2003

50/119

SW388R7

Data Analysis &

Computers II

Slide 50

Correlations

1 .215 .104 .156 .045 .328**

. .079 .397 .203 .713 .006

93 68 68 68 68 68.215 1 .874** .967** .626** .903**

.079 . .000 .000 .000 .000

68 160 160 160 160 160

.104 .874** 1 .967** .910** .611**

.397 .000 . .000 .000 .000

68 160 160 160 160 160

.156 .967** .967** 1 .784** .774**

.203 .000 .000 . .000 .000

68 160 160 160 160 160

.045 .626** .910** .784** 1 .335**

.713 .000 .000 .000 . .000

68 160 160 160 160 160

.328** .903** .611** .774** .335** 1

.006 .000 .000 .000 .000 .

68 160 160 160 160 160

Pearson Correlat ion

Sig. (2-tail ed)

NPearson Correlat ion

Sig. (2-tail ed)

N

Pearson Correlat ion

Sig. (2-tail ed)

N

Pearson Correlat ion

Sig. (2-tail ed)

N

Pearson Correlat ion

Sig. (2-tail ed)

N

Pearson Correlat ion

Sig. (2-tail ed)

N

TOTAL TIME SPENT

ON THE INTERNET

HOURS PER DAY

WATCHING TV

LGTVHOUR

SQTVHOUR

INTVHOUR

S2TVHOUR

TOTAL TIME

SPENT ON

THE

INTERNET

HOURS PER

DAY

WATCHING

TV LGTVHOUR SQTVHOUR INTVHOUR S2TVHOUR

Correlation is significant at the 0.01 level (2-tailed).**.

Assumption of Linearity:The correlation matrix

The answers to the problemsare based on the correlationmatrix.

Before we answer thequestion in this problem, wewill use a script to produce

the output.

• 8/12/2019 Assumptions Summer2003

51/119

SW388R7

Data Analysis &

Computers II

Slide 51

Assumption of Homoscedasticity

Homoscedasticity refers to the assumption that thedependent variable exhibits similar amounts of

variance across the range of values for an

independent variable.

While it applies to independent variables at all three

measurement levels, the methods that we will use to

evaluation homoscedasticity requires that the

independent variable be non-metric (nominal orordinal) and the dependent variable be metric

(ordinal or interval). When both variables are

metric, the assumption is evaluated as part of the

residual analysis in multiple regression.

H d ti it

• 8/12/2019 Assumptions Summer2003

52/119

SW388R7

Data Analysis &

Computers II

Slide 52

Assumption of Homoscedasticity:Evaluating homoscedasticity

Homoscedasticity is evaluated for pairs of variables.

There are both graphical and statistical methods forevaluating homoscedasticity .

The graphical method is called a boxplot.

The statistical method is the Levene statistic which

SPSS computes for the test of homogeneity ofvariances.

Neither of the methods is absolutely definitive.

f H d ti it

• 8/12/2019 Assumptions Summer2003

53/119

SW388R7

Data Analysis &

Computers II

Slide 53

56114220138N =

MARITAL STATUS

NEVER MARRIED

SEPARATED

DIVORCED

WIDOWED

MARRIED

5

4

3

2

1

0

-1

91829105132142256

23421711281696640

2361976863

588789214243

203134

18117116310090

262141

78

Assumption of Homoscedasticity:The boxplot

Each red box shows the middle50% of the cases for the group,indicating how spread out thegroup of scores is.

If the variance acrossthe groups is equal, theheight of the red boxeswill be similar across thegroups.

If the heights of the redboxes are different, theplot suggests that thevariance across groupsis not homogeneous.

The married group ismore spread out thanthe other groups,suggesting unequalvariance.

S 388R7 A i f H d ti it

• 8/12/2019 Assumptions Summer2003

54/119

SW388R7

Data Analysis &

Computers II

Slide 54

Test of Homogeneity of Variances

RS HIGHEST DEGREE

5.239 4 262 .000

Levene

Statistic df1 df2 Sig.

Assumption of Homoscedasticity:Levene test of the homogeneity of variance

The null hypothesis for the test of homogeneity ofvariance states that the variance of the dependentvariable is equal across groups defined by theindependent variable, i.e., the variance is homogeneous.

Since the probability associated with the Levene Statistic(

• 8/12/2019 Assumptions Summer2003

55/119

SW388R7

Data Analysis &

Computers II

Slide 55

Assumption of Homoscedasticity:Transformations

When the assumption of homoscedasticity is notsupported, we can transform the dependent variablevariable and test it for homoscedasticity . If thetransformed variable demonstrateshomoscedasticity, we can substitute it in our

analysis.

We use the sample three common transformationsthat we used for normality: the logarithmictransformation, the square root transformation, and

the inverse transformation.

All of these change the measuring scale on thehorizontal axis of a histogram to produce atransformed variable that is mathematically

equivalent to the original variable.

SW388R7 A ti f H d ti it

• 8/12/2019 Assumptions Summer2003

56/119

SW388R7

Data Analysis &

Computers II

Slide 56

Assumption of Homoscedasticity:When transformations do not work

When none of the transformations results in

homoscedasticity for the variables in the

relationship, including that variable in the analysis

will reduce our effectiveness at identifying statistical

relationships, i.e. we lose power.

SW388R7 A ti f H d ti it

• 8/12/2019 Assumptions Summer2003

57/119

SW388R7

Data Analysis &

Computers II

Slide 57

Assumption of Homoscedasticity:Request a boxplot

The boxplot provides a visualimage of the distribution of the

dependent variable for thegroups defined by theindependent variable.

To request a boxplot, choosethe BoxPlotcommand fromthe Graphsmenu.

Suppose we want totest for homogeneity ofvariance: whether thevariance in "highestacademic degree" ishomogeneous for thecategories of "maritalstatus."

SW388R7 A ti f Homoscedasticit

• 8/12/2019 Assumptions Summer2003

58/119

SW388R7

Data Analysis &

Computers II

Slide 58

Assumption of Homoscedasticity:Specify the type of boxplot

First, click on the Simplestyle of boxplot to highlightit with a rectangle aroundthe thumbnail drawing.

Second, click on the Definebutton to specify thevariables to be plotted.

SW388R7 A ti f Homoscedasticity

• 8/12/2019 Assumptions Summer2003

59/119

SW388R7

Data Analysis &

Computers II

Slide 59

Assumption of Homoscedasticity:Specify the dependent variable

First, click on thedependent variableto highlight it.

Second, click on the rightarrow button to move thedependent variable to theVariabletext box.

SW388R7 A ti f Homoscedasticity

• 8/12/2019 Assumptions Summer2003

60/119

SW388R7

Data Analysis &

Computers II

Slide 60

Assumption of Homoscedasticity:Specify the independent variable

First, click on theindependentvariable to highlightit.

Second, click on the rightarrow button to move theindependent variable to the

Category Axistext box.

SW388R7 A ti f Homoscedasticity

• 8/12/2019 Assumptions Summer2003

61/119

SW388R7

Data Analysis &

Computers II

Slide 61

Assumption of Homoscedasticity:Complete the request for the boxplot

To complete therequest for theboxplot, click onthe OK button.

• 8/12/2019 Assumptions Summer2003

62/119

SW388R7 Assumption of Homoscedasticity :

• 8/12/2019 Assumptions Summer2003

63/119

SW388R7

Data Analysis &

Computers II

Slide 63

Assumption of Homoscedasticity:Request the test for homogeneity of variance

To compute the Levene test for

homogeneity of variance,select the Compare Means |One-Way ANOVAcommandfrom theAnalyzemenu.

• 8/12/2019 Assumptions Summer2003

64/119

SW388R7 Assumption of Homoscedasticity :

• 8/12/2019 Assumptions Summer2003

65/119

Data Analysis &

Computers II

Slide 65

Assumption of Homoscedasticity:Specify the dependent variable

First, click on thedependent variableto highlight it.

Second, click on the rightarrow button to move thedependent variable to theDependent Listtext box.

SW388R7 Assumption of Homoscedasticity :

• 8/12/2019 Assumptions Summer2003

66/119

Data Analysis &

Computers II

Slide 66

Assumption of Homoscedasticity:The homogeneity of variance test is an option

Click on the Optionsbutton to open the optionsdialog box.

• 8/12/2019 Assumptions Summer2003

67/119

SW388R7 Assumption of Homoscedasticity :

• 8/12/2019 Assumptions Summer2003

68/119

Data Analysis &

Computers II

Slide 68

Assumption of Homoscedasticity:Complete the request for output

Click on the OK button tocomplete the request forthe homogeneity ofvariance test through theone-way anova procedure.

SW388R7 Assumption of Homoscedasticity :

• 8/12/2019 Assumptions Summer2003

69/119

Data Analysis &

Computers II

Slide 69

Test of Homogeneity of Variances

RS HIGHEST DEGREE

5.239 4 262 .000

Levene

Statistic df1 df2 Sig.

Assumption of Homoscedasticity:Interpreting the homogeneity of variance test

The null hypothesis for the test of homogeneity ofvariance states that the variance of the dependentvariable is equal across groups defined by theindependent variable, i.e., the variance is homogeneous.

Since the probability associated with the Levene Statistic(

• 8/12/2019 Assumptions Summer2003

70/119

Data Analysis &

Computers II

Slide 70

Using scripts

The process of evaluating assumptions requires

numerous SPSS procedures and outputs that are time

consuming to produce.

These procedures can be automated by creating an

SPSS script. A script is a program that executes a

sequence of SPSS commands.

Though writing scripts is not part of this course, we

can take advantage of scripts that I use to reduce

the burdensome tasks of evaluating assumptions .

SW388R7

• 8/12/2019 Assumptions Summer2003

71/119

Data Analysis &

Computers II

Slide 71

Using a script for evaluating assumptions

The script EvaluatingAssumptionsAndMissingData.exe

will produce all of the output we have used for

evaluating assumptions.

Navigate to the link SPSS Scripts and Syntax on the

course web page.

MissingData.exe to your computer and install it,

following the directions on the web page.

• 8/12/2019 Assumptions Summer2003

72/119

SW388R7

D t A l i &

I k h i i SPSS

• 8/12/2019 Assumptions Summer2003

73/119

Data Analysis &

Computers II

Slide 73

Invoke the script in SPSS

To invoke the script, selectthe Run Script commandin the Utilitiesmenu.

SW388R7

D t A l i &

S l h i

• 8/12/2019 Assumptions Summer2003

74/119

Data Analysis &

Computers II

Slide 74

Select the script

First, navigate to the folder where you put the script.If you followed the directions, you will have a file withan ".SBS" extension in the C:\SW388R7 folder.

If you only see a file with an .EXE extension in thefolder, you should double click on that file to extractthe script file to the C:\SW388R7 folder.

Third, click onRunbutton tostart the script.

Second, click on thescript name to highlight

it.

SW388R7

Data Analysis &

Th i t di l

• 8/12/2019 Assumptions Summer2003

75/119

Data Analysis &

Computers II

Slide 75

The script dialog

The script dialog box acts

similarly to SPSS dialogboxes. You select thevariables to include in theanalysis and choose optionsfor the output.

SW388R7

Data Analysis &

C l t th ifi ti 1

• 8/12/2019 Assumptions Summer2003

76/119

Data Analysis &

Computers II

Slide 76

Complete the specifications - 1

Move the the dependent and

independent variables from the list ofvariables to the list boxes. Metricand nonmetric variables are movedto separate lists so the computerknows how you want them treated.

You must also indicate the levelof measurement for thedependent variable. By defaultthe metric option button is

marked.

SW388R7

Data Analysis &

C l t th ifi ti 2

• 8/12/2019 Assumptions Summer2003

77/119

Data Analysis &

Computers II

Slide 77

Complete the specifications - 2

Mark the optionbutton for the typeof output you wantthe script tocompute.

Click on the OKbutton to producethe output.

Select thetransformations to be tested.

SW388R7

Data Analysis &

Th i t fi i h

• 8/12/2019 Assumptions Summer2003

78/119

Data Analysis &

Computers II

Slide 78

The script finishes

If your SPSS output viewer isopen, you will see the outputproduced in that window.

Since it may take a while toproduce the output, andsince there are times whenit appears that nothing ishappening, there is an alertto tell you when the script isfinished.

Unless you are absolutely

sure something has gonewrong, let the script rununtil you see this alert.

When you see this alert,click on the OK button.

SW388R7

Data Analysis &

O t t f th i t 1

• 8/12/2019 Assumptions Summer2003

79/119

Data Analysis &

Computers II

Slide 79

Output from the script - 1

The script will produce lotsof output. Additionaldescriptive material in the

Scroll through the script tolocate the outputs neededto answer the question.

SW388R7

Data Analysis &

Cl i g th i t di l g b

• 8/12/2019 Assumptions Summer2003

80/119

Data Analysis &

Computers II

Slide 80

Closing the script dialog box

The script dialog box doesnot close automaticallybecause we often want torun another test right away.There are two methods forclosing the dialog box.

Click on the Cancelbutton to close thescript.

Click on theXclose box to closethe script.

SW388R7

Data Analysis &

Problem 1

• 8/12/2019 Assumptions Summer2003

81/119

Data Analysis &

Computers II

Slide 81

Problem 1

In the dataset GSS2000R, is the following statement true, false, or an

incorrect application of a statistic? Use a level of significance of 0.01

for evaluating missing data and assumptions.

In pre-screening the data for use in a multiple regression of the

dependent variable "total hours spent on the Internet" [netime] withthe independent variables "age" [age], "sex" [sex], and "income"

[rincom98], the evaluation of the assumptions of normality, linearity,

and homogeneity of variance did not indicate any need for a caution to

be added to the interpretation of the analysis.

1. True

2. True with caution

3. False

4. Inappropriate application of a statistic

SW388R7

Data Analysis &

Level of measurement

• 8/12/2019 Assumptions Summer2003

82/119

Data Analysis &

Computers II

Slide 82

Level of measurement

9. In the dataset GSS2000R, is the following statement true, false, oran incorrect application of a statistic? Use a level of significance of

0.01 for evaluating missing data and assumptions.

In pre-screening the data for use in a multiple regression of the

dependent variable "total hours spent on the Internet" [netime] withthe independent variables "age" [age], "sex" [sex], and "income"

[rincom98], the evaluation of the assumptions of normality, linearity,

and homogeneity of variance did not indicate any need for a caution to

be added to the interpretation of the analysis.

Since we are pre-screening

for a multiple regressionproblem, we should makesure we satisfy the level ofmeasurement beforeproceeding.

"Total hours spent on the Internet"[netime] is interval, satisfying the metriclevel of measurement requirement forthe dependent variable.

"Age" [age] and "highest year of school completed" [educ] are interval,satisfying the metric or dichotomous level of measurement requirement for

independent variables.

"Sex" [sex] is dichotomous, satisfying the metric or dichotomous level ofmeasurement requirement for independent variables.

"Income" [rincom98] is ordinal, satisfying the metric or dichotomous level ofmeasurement requirement for independent variables. Since some dataanalysts do not agree with this convention of treating an ordinal variable asmetric, a note of caution should be included in our interpretation.

• 8/12/2019 Assumptions Summer2003

83/119

SW388R7

Data Analysis &

Run the script to test normality 2

• 8/12/2019 Assumptions Summer2003

84/119

y

Computers II

Slide 84

Run the script to test normality - 2

First, navigate to theSW388R7 folder on yourcomputer.

Third, click onthe Runbutton toopen the script.

Second, click on the script name to select it:EvaluatingAssumptionsAndMissingData.SBS

SW388R7

Data Analysis &

Run the script to test normality 3

• 8/12/2019 Assumptions Summer2003

85/119

y

Computers II

Slide 85

Run the script to test normality - 3

First, move the variables to the

list boxes based on the role thatthe variable plays in the analysisand its level of measurement.

Third, mark the checkboxesfor the transformations thatwe want to test in evaluatingthe assumption.

Second, click on the Normalityoptionbutton to request that SPSS producethe output needed to evaluate theassumption of normality.

Fourth, click onthe OK button toproduce the output.

SW388R7

Data Analysis &

Normality of the dependent variable

• 8/12/2019 Assumptions Summer2003

86/119

Computers II

Slide 86

Descriptives

10.7312 1.59183

7.5697

13.8927

8.2949

5.5000

235.655

15.35106

.20

102.00

101.80

10.2000

3.532 .250

15.614 .495

Mean

Lower Bound

Upper Bound

95% Confidence

Interval for Mean

5% Trimmed Mean

Median

Variance

Std. Deviation

Minimum

Maximum

Range

Interquartile Range

Skewness

Kurtosis

TOTAL TIME SPENT

ON THE INTERNET

Statistic Std. Error

Normality of the dependent variable

The dependent variable "total hours spent onthe Internet" [netime] did not satisfy thecriteria for a normal distribution. Both theskewness (3.532) and kurtosis (15.614) felloutside the range from -1.0 to +1.0.

SW388R7

Data Analysis &

Normality of transformed dependent variable

• 8/12/2019 Assumptions Summer2003

87/119

Computers II

Slide 87

Normality of transformed dependent variable

Since "total hours spent on the Internet"[netime] did not satisfy the criteria fornormality, we examine the skewness andkurtosis of each of the transformations tosee if any of them satisfy the criteria.

The "log of total hours spent on the Internet[LGNETIME=LG10(NETIME)]" satisfied the criteria for a

normal distribution. The skewness of the distribution(-0.150) was between -1.0 and +1.0 and the kurtosisof the distribution (0.127) was between -1.0 and +1.0.

The "log of total hours spent on the Internet[LGNETIME=LG10(NETIME)]" was substituted for "totalhours spent on the Internet" [netime] in the analysis.

• 8/12/2019 Assumptions Summer2003

88/119

SW388R7

Data Analysis &

Normality of the independent variables - 2

• 8/12/2019 Assumptions Summer2003

89/119

Computers II

Slide 89

Descriptives

13.35 .419

12.52

14.18

13.54

15.00

29.535

5.435

1

23

22

8.00

-.686 .187

-.253 .373

Mean

Lower Bound

Upper Bound

95% Confidence

Interval for Mean

5% Trimmed Mean

Median

Variance

Std. Deviation

Minimum

Maximum

Range

Interquartile Range

Skewness

Kurtosis

RESPONDENTS INCOME

Statistic Std. Error

Normality of the independent variables - 2

The independent variable "income"[rincom98] satisfied the criteria for a normaldistribution. The skewness of the distribution(-0.686) was between -1.0 and +1.0 and thekurtosis of the distribution (-0.253) wasbetween -1.0 and +1.0.

SW388R7

Data Analysis &

C t II Run the script to test linearity - 1

• 8/12/2019 Assumptions Summer2003

90/119

Computers II

Slide 90

Run the script to test linearity - 1

If the script was not closed after

it was used for normality, we cantake advantage of thespecifications already entered. Ifthe script was closed, re-open itas you would for normality.

First, click on the Linearityoptionbutton to request that SPSS producethe output needed to evaluate theassumption of linearity.

When the linearity optionis selected, a default set oftransformations to test ismarked.

SW388R7

Data Analysis &

C t II Run the script to test linearity - 2

• 8/12/2019 Assumptions Summer2003

91/119

Computers II

Slide 91

Run the script to test linearity - 2

Click on the OK

button to producethe output.

Since we have already decided to use thelog of the dependent variable to satisfynormality, that is the form of the

dependent variable we want to evaluatewith the independent variables. Mark thischeckbox for the dependent variable andclear the others.

SW388R7

Data Analysis &

Comp ters II Linearity test with age of respondent

• 8/12/2019 Assumptions Summer2003

92/119

Computers II

Slide 92

Correlations

1 .074 .119 .096 .164

. .483 .257 .362 .116

93 93 93 93 93

.074 1 .979** .995** .916**

.483 . .000 .000 .000

93 270 270 270 270

.119 .979** 1 .994** .978**

.257 .000 . .000 .000

93 270 270 270 270

.096 .995** .994** 1 .951**

.362 .000 .000 . .00093 270 270 270 270

.164 .916** .978** .951** 1

.116 .000 .000 .000 .

93 270 270 270 270

Pearson Correlation

Sig. (2-tailed)

N

Pearson CorrelationSig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Logarithm of NETIME

[LG10(NETIME)]

AGE OF RESPONDENT

Logarithm of AGE

[LG10(AGE)]

Square Root of AGE

[SQRT(AGE)]

Inverse of AGE [-1/(AGE)]

Logarithm

of NETIME

[LG10(NE

TIME)]

AGE OF

RESPON

DENT

Logarithm of

AGE

[LG10(AGE)]

Square Root

of AGE

[SQRT(AGE)]

Inverse of

AGE

[-1/(AGE)]

Correlation is significant at the 0.01 level (2-tailed).**.

Linearity test with age of respondent

The assessment of the linearrelationship between "log of totalhours spent on the Internet[LGNETIME=LG10(NETIME)]" and"age" [age] indicated that therelationship was weak, rather thannonlinear. The statistical probabilitiesassociated with the correlationcoefficients measuring therelationship with the untransformedindependent variable (r=0.074,p=0.483), the logarithmictransformation (r=0.119, p=0.257),the square root transformation(r=0.096, p=0.362), and the inversetransformation (r=0.164, p=0.116),

were all greater than the level ofsignificance for testing assumptions(0.01).

There was no evidence that theassumption of linearity was violated.

SW388R7

Data Analysis &

Computers II Linearity test with respondents income

• 8/12/2019 Assumptions Summer2003

93/119

Computers II

Slide 93

Correlations

1 -.053 .063 .060 .073

. .658 .600 .617 .540

93 72 72 72 72-.053 1 -.922** -.985** -.602**

.658 . .000 .000 .000

72 168 168 168 168

.063 -.922** 1 .974** .848**

.600 .000 . .000 .000

72 168 168 168 168

.060 -.985** .974** 1 .714**

.617 .000 .000 . .000

72 168 168 168 168

.073 -.602** .848** .714** 1

.540 .000 .000 .000 .

72 168 168 168 168

Pearson Correlation

Sig. (2-tailed)

NPearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Logarithm of NETIME

[LG10(NETIME)]

RESPONDENTS INCOME

Logarithm of Reflected

Values of RINCOM98

[LG10( 24-RINCOM98)]

Square Root of Reflected

Values of RINCOM98

[SQRT( 24-RINCOM98)]

Inverse of Reflected

Values of RINCOM98 [-1/(

24-RINCOM98)]

Logarithm

of NETIME

[LG10(NE

TIME)]

RESPONDEN

TS INCOME

Logarithmof Reflected

Values of

RINCOM98

[LG10(

24-RINCOM

98)]

Square Rootof Reflected

Values of

RINCOM98

[SQRT(

24-RINCOM

98)]

Inverse ofReflected

Values of

RINCOM9

8 [-1/(

24-RINC

OM98)]

Correlation is significant at the 0.01 level (2-tail ed).**.

Linearity test with respondent s income

The assessment of the linearrelationship between "log of total hoursspent on the Internet[LGNETIME=LG10(NETIME)]" and"income" [rincom98] indicated that therelationship was weak, rather thannonlinear. The statistical probabilities

associated with the correlationcoefficients measuring the relationshipwith the untransformed independentvariable (r=-0.053, p=0.658), thelogarithmic transformation (r=0.063,p=0.600), the square roottransformation (r=0.060, p=0.617),and the inverse transformation(r=0.073, p=0.540), were all greaterthan the level of significance for testingassumptions (0.01).

There was no evidence that theassumption of linearity was violated.

SW388R7

Data Analysis &

Computers II

Run the script to testh i f i 1

• 8/12/2019 Assumptions Summer2003

94/119

Computers II

Slide 94homogeneity of variance - 1

First, click on the Homogeneity ofvarianceoption button to request thatSPSS produce the output needed toevaluate the assumption ofhomogeneity.

When the homogeneity ofvariance option is selected, adefault set of transformationsto test is marked.

If the script was not closed afterit was used for normality, we cantake advantage of thespecifications already entered. Ifthe script was closed, re-open itas you would for normality.

SW388R7

Data Analysis &

Computers II

Run the script to testh i f i 2

• 8/12/2019 Assumptions Summer2003

95/119

Computers II

Slide 95homogeneity of variance - 2

In this problem, we havealready decided to use the logtransformation for thedependent variable, so weonly need test it. Next, clearall of the transformationcheckboxes except forLogarithmic.

Finally, click onthe OK button toproduce the output.

SW388R7

Data Analysis &

Computers II Levene test of homogeneity of variance

• 8/12/2019 Assumptions Summer2003

96/119

Computers II

Slide 96

Test of Homogeneity of Variances

Logarithm of NETIME [LG10(NETIME)]

.166 1 91 .685

Levene

Statistic df1 df2 Sig.

Levene test of homogeneity of variance

Based on the Levene Test, the variance in "log of totalhours spent on the Internet[LGNETIME=LG10(NETIME)]" was homogeneous for thecategories of "sex" [sex]. The probability associatedwith the Levene statistic (0.166) was p=0.685, greater

than the level of significance for testing assumptions(0.01). The null hypthesis that the group varianceswere equal was not rejected.

The homogeneity of variance assumption was satisfied.

SW388R7

Data Analysis &

• 8/12/2019 Assumptions Summer2003

97/119

Computers II

Slide 97

In pre-screening the data for use in a multiple regression of the

dependent variable "total hours spent on the Internet" [netime]with the independent variables "age" [age], "sex" [sex], and"income" [rincom98], the evaluation of the assumptions ofnormality, linearity, and homogeneity of variance did notindicate any need for a caution to be added to theinterpretation of the analysis.

1. True

2. True with caution

3. False

4. Inappropriate application of a statistic

The logarithmic transformation of the dependent variable[LGNETIME=LG10(NETIME)] solved the only problem withnormality that we encountered. In that form, the relationshipwith the metric dependent variables was weak, but there was noevidence of nonlinearity. The variance of log transform of thedependent variable was homogeneous for the categories of the

nonmetric variable sex.

No cautions were needed because of a violation of assumptions.A caution was needed because respondents income was ordinallevel.

The answer to the problem is true with caution.

• 8/12/2019 Assumptions Summer2003

98/119

• 8/12/2019 Assumptions Summer2003

99/119

SW388R7

Data Analysis &

Computers II Run the script to test normality - 1

• 8/12/2019 Assumptions Summer2003

100/119

C p

Slide 100

Run the script to test normality 1

To run the script to testassumptions, choose theRun Scriptcommand fromthe Utilities menu.

SW388R7

Data Analysis &

Computers II Run the script to test normality - 2

• 8/12/2019 Assumptions Summer2003

101/119

p

Slide 101

Run the script to test normality 2

First, navigate to theSW388R7 folder on yourcomputer.

Third, click onthe Runbutton toopen the script.

Second, click on the script name to select it:EvaluatingAssumptionsAndMissingData.SBS

• 8/12/2019 Assumptions Summer2003

102/119

• 8/12/2019 Assumptions Summer2003

103/119

SW388R7

Data Analysis &

Computers II Normality of the first independent variables

• 8/12/2019 Assumptions Summer2003

104/119

Slide 104

Descriptives

1.4944 .09456

1.3081

1.6808

1.4365

1.4000

1.9581.39929

-1.14

13.39

14.53

1.8000

2.885 .164

22.665 .327

Mean

Lower Bound

Upper Bound

95% Confidence

Interval for Mean

5% Trimmed Mean

Median

VarianceStd. Deviation

Minimum

Maximum

Range

Interquartile Range

Skewness

Kurtosis

Population growth rate

Statistic Std. Error

y p

The independent variable "population growthrate" [pgrowth] did not satisfy the criteria fora normal distribution. Both the skewness(2.885) and kurtosis (22.665) fell outside therange from -1.0 to +1.0.

SW388R7

Data Analysis &

Computers II

Normality of transformed independentvariable

• 8/12/2019 Assumptions Summer2003

105/119

Slide 105variable

Neither the logarithmic(skew=-0.218,kurtosis=1.277), the

square root (skew=0.873,kurtosis=5.273), nor theinverse transformation(skew=-1.836,kurtosis=5.763) inducednormality in the variable"population growth rate"[pgrowth].

A caution was added tothe findings.

• 8/12/2019 Assumptions Summer2003

106/119

SW388R7

Data Analysis &

Computers II

Normality of transformed independentvariable

• 8/12/2019 Assumptions Summer2003

107/119

Slide 107variable

Since the distribution was skewed to theleft, it was necessary to reflect, or reversecode, the values for the variable beforecomputing the transformation.

The "square root of percent of the total population who was literate(using reflected values) [SQLITERA=SQRT(101-LITERACY)]" satisfiedthe criteria for a normal distribution. The skewness of the distribution(0.567) was between -1.0 and +1.0 and the kurtosis of the distribution(-0.964) was between -1.0 and +1.0. The "square root of percent of thetotal population who was literate (using reflected values)[SQLITERA=SQRT(101-LITERACY)]" was substituted for "percent of thetotal population who was literate" [literacy] in the analysis.

SW388R7

Data Analysis &

Computers II Normality of the third independent variables

• 8/12/2019 Assumptions Summer2003

108/119

Slide 108

Descriptives

8554.43 580.523

7410.27

9698.59

7818.67

5000.00

7.4E+078590.954

510

36400

35890

11200.00

1.207 .164

.475 .327

Mean

Lower Bound

Upper Bound

95% Confidence

Interval for Mean

5% Trimmed Mean

Median

VarianceStd. Deviation

Minimum

Maximum

Range

Interquartile Range

Skewness

Kurtosis

Per capita GDP

Statistic Std. Error

y p

The independent variable "per capita GDP"[gdp] did not satisfy the criteria for a normaldistribution. The kurtosis of the distribution(0.475) was between -1.0 and +1.0, but theskewness of the distribution (1.207) felloutside the range from -1.0 to +1.0.

SW388R7

Data Analysis &

Computers II

Normality of transformed independentvariable

• 8/12/2019 Assumptions Summer2003

109/119

Slide 109variable

The "square root of percapita GDP[SQGDP=SQRT(GDP)]"satisfied the criteria for a

normal distribution. Theskewness of thedistribution (0.614) wasbetween -1.0 and +1.0and the kurtosis of thedistribution (-0.773) wasbetween -1.0 and +1.0.

The "square root of percapita GDP[SQGDP=SQRT(GDP)]"was substituted for "percapita GDP" [gdp] in theanalysis.

SW388R7

Data Analysis &

Computers II Run the script to test linearity - 1

• 8/12/2019 Assumptions Summer2003

110/119

Slide 110

p y

If the script was not closed after

it was used for normality, we cantake advantage of thespecifications already entered. Ifthe script was closed, re-open itas you would for normality.

First, click on the Linearityoptionbutton to request that SPSS producethe output needed to evaluate theassumption of linearity. When the linearity option

is selected, a default set oftransformations to test ismarked.

Click on the OKbutton to producethe output.

SW388R7

Data Analysis &

Computers II Linearity test with population growth rate

• 8/12/2019 Assumptions Summer2003

111/119

Slide 111

Correlations

1 -.262** -.314** -.301** -.282**

. .000 .000 .000 .000

219 219 219 219 219

-.262** 1 .930** .979** .801**.000 . .000 .000 .000

219 219 219 219 219

-.314** .930** 1 .985** .956**

.000 .000 . .000 .000

219 219 219 219 219

-.301** .979** .985** 1 .897**

.000 .000 .000 . .000219 219 219 219 219

-.282** .801** .956** .897** 1

.000 .000 .000 .000 .

219 219 219 219 219

Pearson Correlation

Sig. (2-tailed)

N

Pearson CorrelationSig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Life expectancy at birth -

total population

Population growth rate

Logarithm of PGROWTH

[LG10( 2.14+PGROWTH)]

Square Root of

PGROWTH [SQRT(

2.14+PGROWTH)]

Inverse of PGROWTH [-1/(

2.14+PGROWTH)]

Life

expectancy

at birth - total

population

Population

growth rate

Logarithm of

PGROWTH

[LG10(

2.14+PGRO

WTH)]

Square Root

of PGROWTH

[SQRT(

2.14+PGROW

TH)]

Inverse of

PGROWT

H [-1/(

2.14+PG

ROWTH)]

Correlation is significant at the 0.01 level (2-tailed).**.

The assessment of the linearityof the relationship between "lifeexpectancy at birth" [lifeexp]and "population growth rate"[pgrowth] indicated that therelationship could be considered

linear because the probabilityassociated with the correlationcoefficient for the relationship(r=-0.262) was statisticallysignficant (p

• 8/12/2019 Assumptions Summer2003

112/119

Slide 112

Correlations

1 .724** -.670** -.720** -.467**

. .000 .000 .000 .000

219 203 203 203 203

.724** 1 -.895** -.978** -.594**

.000 . .000 .000 .000

203 203 203 203 203

-.670** -.895** 1 .966** .857**

.000 .000 . .000 .000

203 203 203 203 203

-.720** -.978** .966** 1 .717**

.000 .000 .000 . .000

203 203 203 203 203

-.467** -.594** .857** .717** 1

.000 .000 .000 .000 .

203 203 203 203 203

Pearson Correlation

Sig. (2-tailed)

NPearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Life expectancy at birth -

total population

Percent l iterate - total

population

Logarithm of Reflected

Values of LITERACY

[LG10( 101-LITERACY)]

Square Root of Reflected

Values of LITERACY

[SQRT( 101-LITERACY)]

Inverse of Reflected

Values of LITERACY [-1/(

101-LITERACY)]

Life

expectancy

at birth - total

population

Percent

literate - total

population

Logarithmof Reflected

Values of

LITERACY

[LG10(

101-LITERA

CY)]

Square Rootof Reflected

Values of

LITERACY

[SQRT(

101-LITERA

CY)]

Inverse ofReflected

Values of

LITERAC

Y [-1/(

101-LITE

RACY)]

Correlation is significant at the 0.01 level (2-tai led).**.

The transformation "squareroot of percent of the totalpopulation who was literate(using reflected values)[SQLITERA=SQRT(101-LITERACY)]" was incorporatedin the analysis in theevaluation of normality.

Additional transformations forlinearity were not considered.

SW388R7

Data Analysis &

Computers II Linearity test with per capita GDP

• 8/12/2019 Assumptions Summer2003

113/119

Slide 113

Correlations

1 .643** .762** .713** .727**

. .000 .000 .000 .000

219 219 219 219 219

.643** 1 .898** .978** .637**

.000 . .000 .000 .000

219 219 219 219 219

.762** .898** 1 .969** .890**

.000 .000 . .000 .000

219 219 219 219 219

.713** .978** .969** 1 .762**

.000 .000 .000 . .000

219 219 219 219 219

.727** .637** .890** .762** 1

.000 .000 .000 .000 .

219 219 219 219 219

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Life expectancy at birth -

total population

Per capita GDP

Logarithm of GDP

[LG10(GDP)]

Square Root of GDP

[SQRT(GDP)]

Inverse of GDP [-1/(GDP)]

Life

expectancy

at birth - total

population

Per capi ta

GDP

Logarithm of

GDP

[LG10(GDP)]

Square Root

of GDP

[SQRT(GDP)]

Inverse of

GDP

[-1/(GDP)]

Correlation is significant at the 0.01 level (2-tailed).**.

The transformation "squareroot of per capita GDP[SQGDP=SQRT(GDP)]" wasincorporated in the analysisin the evaluation ofnormality. Additional

transformations for linearitywere not considered.

SW388R7

Data Analysis &

Computers II

Run the script to testhomogeneity of variance - 1

• 8/12/2019 Assumptions Summer2003

114/119

Slide 114homogeneity of variance 1

There were no nonmetricvariables in this analysis, so thetest of homogeneity of variancewas not conducted.

SW388R7

Data Analysis &

• 8/12/2019 Assumptions Summer2003

115/119

Slide 115

In pre-screening the data for use in a multiple regression of the

dependent variable "life expectancy at birth" [lifeexp] with theindependent variables "population growth rate" [pgrowth], "percent ofthe total population who was literate" [literacy], and "per capita GDP"[gdp], the evaluation of the assumptions of normality, linearity, andhomogeneity of variance did not indicate any need for a caution to beadded to the interpretation of the analysis.

1. True2. True with caution

3. False

4. Inappropriate application of a statistic

Two transformations were substituted to satisfy the assumption ofnormality: the "square root of percent of the total population whowas literate (using reflected values) [SQLITERA=SQRT(101-LITERACY)]" and the "square root of per capita GDP[SQGDP=SQRT(GDP)]" was substituted for "per capita GDP" [gdp] inthe analysis.

However, none of the transformations induced normality in thevariable "population growth rate" [pgrowth]. A caution was added to

the findings.

The answer to the problem is false. A caution was added because"Population growth rate" [pgrowth] did not satisfy the assumption ofnormality and none of the transformations were successful ininducing normality.

SW388R7

Data Analysis &

Computers II

Steps in evaluating assumptions:level of measurement

• 8/12/2019 Assumptions Summer2003

116/119

Slide 116level of measurement

The following is a guide to the decision process for answeringproblems about assumptions for multiple regression:

Incorrect application

of a statistic

Yes

NoIs the dependent

variable metric and theindependent variablesmetric or dichotomous?

SW388R7

Data Analysis &

Computers II

Steps in evaluating assumptions:assumption of normality for metric variable

• 8/12/2019 Assumptions Summer2003

117/119

Slide 117assumption of normality for metric variable

Does one or more of thetransformations satisfythe criteria for a normaldistribution?

No

Yes

Assumptionsatisfied, useuntransformedvariable in analysis

No

YesDoes the dependentvariable satisfy thecriteria for a normaldistribution?

Assumptionsatisfied, usetransformedvariable withsmallest skew

Assumption notsatisfied, useuntransformedvariable in analysis

SW388R7

Data Analysis &

Computers II

Steps in evaluating assumptions:assumption of linearity for metric variables

• 8/12/2019 Assumptions Summer2003

118/119

Slide 118assumption of linearity for metric variables

Yes

Probability of correlation (r) forrelationship between IV andDV

• 8/12/2019 Assumptions Summer2003

119/119

Slide 119g y

Yes

Probability of Levenestatistic