sw388r7 data analysis & computers ii slide 1 assumption of linearity strategy for solving...

53
SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption of linearity script Sample Problems

Upload: arline-gaines

Post on 23-Dec-2015

223 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 1

Assumption of linearity

Assumption of linearity

Strategy for solving problems

Producing outputs for evaluating linearity

Assumption of linearity script

Sample Problems

Page 2: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 2

Assumption of linearity

The statistics that we will study this semester generally assume that the relationship between variables is linear, or they perform better if the relationships are linear.

If a relationship is nonlinear, the statistics which assume it is linear will underestimate the strength of the relationship, or fail to detect the existence of a relationship.

Page 3: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 3

Linearity

Linearity means that the amount of change, or rate of change, between scores on two variables is constant for the entire range of scores for the variables.

There are relationships are not linear. The relationship between learning and time

may not be linear. Learning a new subject shows rapid gains at first, then the pace slows down over time. This is often referred to a a learning curve.

Population growth may not be linear. The pattern often shows growth at increasing rates over time.

Page 4: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 4

Year

1990

1980

1970

1960

1950

1940

1930

1920

1910

1900

1890

1880

1870

1860

1850

Pop

ulat

ion

20,000,000

18,000,000

16,000,000

14,000,000

12,000,000

10,000,000

8,000,000

6,000,000

4,000,000

2,000,000

0

Population growth in Texas

The increase in population for the ten years from 1860 to 1870 is relatively small compared to the increase in the population for the ten years from 1960 to 1970.

A difference of 214,364.

A difference of 1,617,053.

Page 5: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 5

Evaluating linearity

There are both graphical and statistical methods for evaluating linearity.

Graphical methods include the examination of scatter plots, often overlaid with a trend line. While commonly recommended, this strategy is difficult to interpret.

Statistical methods include diagnostic hypothesis tests for linearity, a rule of thumb that says a relationship is linear if the difference between the linear correlation coefficient (r) and the nonlinear correlation coefficient (eta) is small, and examining patterns of correlation coefficients.

Page 6: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 6

RESPONDENT'S SOCIOECONOMIC INDEX

100806040200

RS

OC

CU

PA

TIO

NA

L P

RE

ST

IGE

SC

OR

E

(198

0)

90

80

70

60

50

40

30

20

10

Interpreting scatter plots

The advice for interpreting linearity is often phrased as looking for a cigar-shaped band, which is very evident in this plot.

Page 7: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 7

Gross domestic product / capita

3000020000100000-10000

Infa

nt m

orta

lity

(dea

ths

per

1000

live

birt

hs)

200

100

0

-100

Interpreting scatter plots

Sometimes, a scatter plot shows a clearly nonlinear pattern that requires transformation, like the one shown in the scatter plot.

Page 8: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 8

Scatter plots that are difficult to interpret

AGE OF RESPONDENT

8070605040302010

TO

TA

L T

IME

SP

EN

T O

N T

HE

IN

TE

RN

ET

120

100

80

60

40

20

0

-20

HOURS PER DAY WATCHING TV

1614121086420-2

TO

TA

L T

IME

SP

EN

T O

N T

HE

IN

TE

RN

ET

120

100

80

60

40

20

0

-20

The correlations for both of these relationships are low.

The linearity of the relationship on the right can be improved with a transformation; the plot on the left cannot. However, this is not necessarily obvious from the scatter plots.

Page 9: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 9

Using correlation matrices

Correlations

1 .017 .048 -.009 .032 .079

. .874 .648 .931 .761 .453

93 93 93 93 93 93

.017 1 .979** .983** .995** .916**

.874 . .000 .000 .000 .000

93 270 270 270 270 270

.048 .979** 1 .926** .994** .978**

.648 .000 . .000 .000 .000

93 270 270 270 270 270

-.009 .983** .926** 1 .960** .832**

.931 .000 .000 . .000 .000

93 270 270 270 270 270

.032 .995** .994** .960** 1 .951**

.761 .000 .000 .000 . .000

93 270 270 270 270 270

.079 .916** .978** .832** .951** 1

.453 .000 .000 .000 .000 .

93 270 270 270 270 270

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

TOTAL TIME SPENT ONTHE INTERNET

AGE OF RESPONDENT

Logarithm of AGE[LG10(AGE)]

Square of AGE [(AGE)**2]

Square Root of AGE[SQRT(AGE)]

Inverse of AGE [-1/(AGE)]

TOTAL TIMESPENT ON

THEINTERNET

AGE OFRESPON

DENT

Logarithm ofAGE

[LG10(AGE)]

Square ofAGE

[(AGE)**2]

Square Rootof AGE

[SQRT(AGE)]

Inverse ofAGE

[-1/(AGE)]

Correlation is significant at the 0.01 level (2-tailed).**.

Creating a correlation matrix for the dependent variable and the original and transformed variations of the independent variable provides us with a pattern that is easier to interpret.

The information that we need is in the first column of the matrix which shows the correlation and significance for the dependent variable and all forms of the independent variable.

Page 10: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 10

The pattern of correlations for no relationship

Correlations

1 .017 .048 -.009 .032 .079

. .874 .648 .931 .761 .453

93 93 93 93 93 93

.017 1 .979** .983** .995** .916**

.874 . .000 .000 .000 .000

93 270 270 270 270 270

.048 .979** 1 .926** .994** .978**

.648 .000 . .000 .000 .000

93 270 270 270 270 270

-.009 .983** .926** 1 .960** .832**

.931 .000 .000 . .000 .000

93 270 270 270 270 270

.032 .995** .994** .960** 1 .951**

.761 .000 .000 .000 . .000

93 270 270 270 270 270

.079 .916** .978** .832** .951** 1

.453 .000 .000 .000 .000 .

93 270 270 270 270 270

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

TOTAL TIME SPENT ONTHE INTERNET

AGE OF RESPONDENT

Logarithm of AGE[LG10(AGE)]

Square of AGE [(AGE)**2]

Square Root of AGE[SQRT(AGE)]

Inverse of AGE [-1/(AGE)]

TOTAL TIMESPENT ON

THEINTERNET

AGE OFRESPON

DENT

Logarithm ofAGE

[LG10(AGE)]

Square ofAGE

[(AGE)**2]

Square Rootof AGE

[SQRT(AGE)]

Inverse ofAGE

[-1/(AGE)]

Correlation is significant at the 0.01 level (2-tailed).**.

The correlation between the two variables is very weak and statistically non-significant. If we viewed this as a hypothesis test for the significance of r, we would conclude that there is no relationship between these variables.

Moreover, none of significance tests for the correlations with the transformed dependent variable are statistically significant. There is no relationship between these variables; it is not a problem with non-linearity.

Page 11: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 11

Correlation pattern suggesting transformation

Correlations

1 .215 .104 .328** .156 .045

. .079 .397 .006 .203 .713

93 68 68 68 68 68

.215 1 .874** .903** .967** .626**

.079 . .000 .000 .000 .000

68 160 160 160 160 160

.104 .874** 1 .611** .967** .910**

.397 .000 . .000 .000 .000

68 160 160 160 160 160

.328** .903** .611** 1 .774** .335**

.006 .000 .000 . .000 .000

68 160 160 160 160 160

.156 .967** .967** .774** 1 .784**

.203 .000 .000 .000 . .000

68 160 160 160 160 160

.045 .626** .910** .335** .784** 1

.713 .000 .000 .000 .000 .

68 160 160 160 160 160

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

TOTAL TIME SPENT ONTHE INTERNET

HOURS PER DAYWATCHING TV

Logarithm of TVHOURS[LG10( 1+TVHOURS)]

Square of TVHOURS[(TVHOURS)**2]

Square Root ofTVHOURS [SQRT(1+TVHOURS)]

Inverse of TVHOURS[-1/( 1+TVHOURS)]

TOTAL TIMESPENT ON

THEINTERNET

HOURS PERDAY

WATCHINGTV

Logarithm ofTVHOURS

[LG10(1+TVHOUR

S)]

Square ofTVHOURS[(TVHOUR

S)**2]

Square Rootof TVHOURS

[SQRT(1+TVHOUR

S)]

Inverse ofTVHOUR

S [-1/(1+TVHOU

RS)]

Correlation is significant at the 0.01 level (2-tailed).**.

The correlation between the two variables is very weak and statistically non-significant. If we viewed this as a hypothesis test for the significance of r, we would conclude that there is no relationship between these variables.

However, the probability associated with the larger correlation for the square transformation is statistically significant, suggesting that this is a transformation we might want to use in our analysis.

Page 12: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 12

Transformations

When a relationship is not linear, we can transform one or both variables to achieve a relationship that is linear.

Four common transformations to induce linearity are: the logarithmic transformation, the square root transformation, the inverse transformation and the square transformation.

All of these transformations produce a new variable that is mathematically equivalent to the original variable, but expressed in different measurement units, e.g. logarithmic units instead of decimal units.

Page 13: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 13

When transformations do not work

When none of the transformations induces linearity in a relationship, our statistical analysis will underestimate the presence and strength of the relationship, i.e. we lose power to detect relationship and estimated values of the dependent variable based on our analysis may be biased or systematically incorrect.

We do have the option of changing the way the information in the variables are represented, e.g. substitute several dichotomous variables for a single metric variable. This bypasses the assumption of linearity while still attempting to incorporate the information about the relationship in the analysis.

Page 14: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 14

Strategy for solving problems - 1

Our strategy for determining whether or not a relationship is linear will be based on significance tests for the Pearson r correlation coefficient and tests of partial correlations between transformed variables and the dependent variable, controlling for the correlation between the independent variable and the dependent variable.

If the correlation coefficient between an independent variable and a dependent variable is statistically significant (its probability is less than or equal to a specified level of significance), we will conclude that the relationship is linear.

Page 15: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 15

Strategy for solving problems - 2

If linearity cannot be supported for the untransformed independent and dependent variables, we will examine the transformations for the variables.

If any of the transformations for the independent or dependent variable are statistically significant when the untransformed relationship is not statistically significant, we will conclude that the problem is non-linearity, and can be remedied by substituting the transformed variable in the analysis.

If neither the untransformed variable nor any of the transformations are statistically significant, we will conclude that there is no relationship between the variables. We do not conclude that the relationship is not linear.

Page 16: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 16

Strategy for solving problems - 3

Even when relationship with the original independent variable is linear, the analysis might still be enhanced by the inclusion of a transformed version of the independent variable to the analysis, e.g. including the square of the independent variable in a regression.

If the partial correlation for a transformation is statistically significant controlling for the relationship between the original independent and depending variables, we will suggest that the transformed variable be included in the analysis, in addition to the original form of the variables. In effect, we are adding the relationship of the transformation to the linear relationship between the independent and dependent variable.

Page 17: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 17

Problem 1

Page 18: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 18

Creating the scatter plot

The most commonly recommended strategy for evaluating linearity is visual examination of a scatter plot.

To obtain a scatter plot in SPSS, select the Scatter… command from the Graphs menu.

Page 19: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 19

Selecting the type of scatter plot

First, click on thumbnail sketch of a simple scatter plot to highlight it.

Second, click on the Define button to specify the variables to be included in the scatter plot.

Page 20: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 20

Selecting the variables

First, move the dependent variable netime to the Y Axis text box.

Second, move the independent variable tvhours to the X axis text box.

If a problem statement mentions a relationship between two variables without clearly indicating which is the independent variable and which is the dependent variable, the first mentioned variable is taken to the be independent variable.

Third, click on the OK button to complete the specifications for the scatter plot.

Page 21: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 21

The scatter plot

The scatter plot is produced in the SPSS output viewer.

The points in a scatter plot are considered linear if they form a cigar-shaped elliptical band.

The pattern in this scatter plot is not really clear.

Page 22: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 22

Adding a trend line

To try to determine if the relationship is linear, we can add a trend line to the chart.

To add a trend line to the chart, we need to open the chart for editing.

To open the chart for editing, double click on it.

Page 23: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 23

The scatter plot in the SPSS Chart Editor

The chart that we double clicked on is opened for editing in the SPSS Chart Editor.

The blue border around the plot area indicates that the plot area is selected.

The icon for adding a trend or fit line to the chart is disabled, so we cannot select it.

Page 24: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 24

Enabling the Add fit line icon

To activate the Add fit line icon, click on one of the points.

When the points are selected, they are bordered in blue.

When the points are selected, their Properties dialog opens. We could use this dialog to change the marker, color, etc.

Page 25: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 25

Requesting the fit line

With the points selected, click on the Add fit line icon.

Page 26: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 26

The fit line and r²

The linear trend or fit line is added to the chart and the Properties dialog for the fit line is opened.

By default, the trend line is linear, and the value for R Square is included in the chart.

The value of r² (0.046) suggests that the relationship is weak.

Page 27: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 27

Changing the shape of the fit line

We can try a trend line with a curved shape to see if it does a better job of fitting the data.

To change the trend line,

First, click on the Quadratic in the Fit Method panel. This will fit a trend line that include a square term in the equation (x²).

Second, click on the apply button to change the trend line.

Page 28: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 28

The quadratic fit line and r²

The fit line curves to reduce the discrepancies between the line and the data points.

The value of r² (0.159) falls at the top of the weak range, indicating a stronger relationship that the one represented by the linear fit line.

This result hints that a squared transformation of the independent variable may be needed.

Page 29: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 29

Changing the color of the fit line

Click the line panel to select a color.

Click on the Apply button, then the color of the trend line will change.

Select a color which you want to change.

Page 30: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 30

Computing the transformations

There are four transformations that we can use to achieve or improve linearity.

The compute dialogs for these four transformations for linearity are shown.

Page 31: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 31

Creating the scatter plot matrix

To create the scatter plot matrix, select the Scatter… command in the Graphs menu.

Page 32: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 32

Selecting type of scatterplot

First, click on the Matrix thumbnail sketch to indicate which type of scatterplot we want.

Second, click on the Define button to select the variables for the scatterplot.

Page 33: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 33

Specifications for scatterplot matrix

First, move the dependent variable, the independent variable and all of the transformations to the Matrix Variables list box.

Second, click on the OK button to produce the scatterplot.

Page 34: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 34

The scatter plot matrix

The scatter plot matrix shows a thumbnail sketch of scatter plots for each independent variable or transformation with the dependent variable. The scatter plot matrix may suggest which transformations might be useful.

Page 35: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 35

Creating the correlation matrix

To create the correlation matrix, select the Correlate | Bivariate… command in the Analyze menu.

Page 36: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 36

Specifications for correlation matrix

First, move the dependent variable, the independent variable and all of the transformations to the Variables list box.

Second, click on the OK button to produce the correlation matrix.

Page 37: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 37

Cor re lations

1 .215 .104 .328** .156 .045

. .079 .397 .006 .203 .713

93 68 68 68 68 68

.215 1 .874** .903** .967** .626**

.079 . .000 .000 .000 .000

68 160 160 160 160 160

.104 .874** 1 .611** .967** .910**

.397 .000 . .000 .000 .000

68 160 160 160 160 160

.328** .903** .611** 1 .774** .335**

.006 .000 .000 . .000 .000

68 160 160 160 160 160

.156 .967** .967** .774** 1 .784**

.203 .000 .000 .000 . .000

68 160 160 160 160 160

.045 .626** .910** .335** .784** 1

.713 .000 .000 .000 .000 .

68 160 160 160 160 160

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

TOTAL TIME SPENT ONTHE INTERNET

HOURS PER DAYWATCHING TV

Logarithm of TV HOURS[LG10( 1+TVHOURS)]

Square of TVHOURS[(TVHOURS)**2]

Square Root ofTVHOURS [SQRT(1+TVHOURS) ]

Inverse of TV HOURS[-1/( 1+TVHOURS)]

TOTAL TIMESPENT ON

THEINTERNET

HOURS PERDAY

WATCHINGTV

Logarithm ofTVHOURS

[LG10(1+TVHOUR

S)]

Square ofTVHOURS[(TVHOUR

S)**2]

Square Rootof TV HOURS

[SQRT(1+TVHOUR

S)]

Inverse ofTVHOUR

S [-1/(1+TVHOU

RS)]

Correlation is signif icant at the 0.01 level (2-tailed).**.

The correlation matrix and the original problem

The output from the script can be used to answer the problem question. The significance of the correlation coefficient between the untransformed variables (0.079) is not significant, suggesting either that the relationship is non-linear relationship (if one of the transformations is significant) or weak (if all of the correlations are not significant).

The correlation between the dependent variable and the square transformation (0.006) is less than the level of significance. The relationship between hours watching TV and time spent on the Internet is not linear, and the transformed variable should be substituted for hours watching TV.

Page 38: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 38

Problem 1 - Answer

The answer is false because the relationship is not-linear, and the transformation should be substituted for the independent variable, not added to the analysis.

Page 39: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 39

The script for testing assumption of linearity

The SPSS script can be used to test the assumption of linearity.

First, move the dependent and independent variables to the list boxes.

Second, mark the Assumption of linearity option button. Third, accept or mark the

transformation to be included in the analysis.

Fourth, click on the OK button to produce the output.

Page 40: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 40

The scatter plot matrix produced by the script

The scatter plot matrix provides a thumbnail sketch of each of the relationships. While we will base our answers on the correlation matrix, the scatter plot matrix should provide visual confirmation of our conclusions.

Page 41: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 41

The correlation matrix produced by the script

The answers to this problem is based on the correlation matrix.

With a non-significant relationship between the untransformed independent variable and the dependent variable, combined with a significant relationship with a transformed variable, we conclude that the relationship is not linear.

Page 42: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 42

Partial correlations produced by the script

Had we concluded that the relationship between the independent and dependent variable been judged to be linear, we would have asked the question of whether or not the analysis could be improved by the addition of a transformed variable.

We answer this second question by examining the statistical significance of the partial correlations, controlling for the linear relationship between the dependent and independent variable.

Page 43: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 43

Problem 2

Page 44: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 44

Correlations

1 .082 -.103 .090 -.091 -.135

. .486 .379 .444 .438 .247

93 75 75 75 75 75

.082 1 -.871** .965** -.981** -.361**

.486 . .000 .000 .000 .000

75 176 176 176 176 176

-.103 -.871** 1 -.940** .946** .743**

.379 .000 . .000 .000 .000

75 176 176 176 176 176

.090 .965** -.940** 1 -.993** -.475**

.444 .000 .000 . .000 .000

75 176 176 176 176 176

-.091 -.981** .946** -.993** 1 .502**

.438 .000 .000 .000 . .000

75 176 176 176 176 176

-.135 -.361** .743** -.475** .502** 1

.247 .000 .000 .000 .000 .

75 176 176 176 176 176

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

TOTAL TIME SPENTON THE INTERNET

NUMBER OF HOURSWORKED LAST WEEK

Logarithm of HRS1[LG10( 81-HRS1)]

Square of HRS1[(HRS1)**2]

Square Root of HRS1[SQRT( 81-HRS1)]

Inverse of HRS1 [-1/(81-HRS1)]

TOTAL TIMESPENT ON

THEINTERNET

NUMBER OFHOURS

WORKEDLAST WEEK

Logarithm ofHRS1 [LG10(

81-HRS1)]

Square ofHRS1

[(HRS1)**2]

Square Rootof HRS1[SQRT(

81-HRS1)]

Inverse ofHRS1 [-1/(81-HRS1)]

Correlation is significant at the 0.01 level (2-tailed).**.

The correlation matrix

The probability associated with the correlation coefficient between "number of hours worked in the past week" and "total hours spent on the Internet" (0.486) is greater than the level of significance, suggesting either that the relationship is non-linear relationship (if one of the transformations is significant) or weak (if all of the correlations are not significant).

The lack of statistical significance for all of the transformations suggests that there is a weak relationship between "number of hours worked in the past week" and "total hours spent on the Internet", and the lack of relationship is not attributable to non-linearity.

Page 45: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 45

Problem 2 - Answer

Without any evidence that there is a non-linear relationship, the answer to the question is true. There is a weak or very weak linear relationship.

Page 46: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 46

Problem 3

Page 47: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 47

The correlation matrix

The correlation between "highest year of school completed" and "occupational prestige score" was statistically significant (r=.495, p<0.001). A linear relationship exists between these variables.

Page 48: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 48

The partial correlation matrix - 1

Controlling for "highest year of school completed", the partial correlation for several of the transformations indicated a statistically significant relationship to "occupational prestige score": the logarithmic transformation (r=-0.254, p<0.001); the square root transformation (r=-0.257, p<0.001); the inverse transformation (r=-0.232, p<0.001); and the square transformation (r=0.246, p<0.001).

Page 49: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 49

The partial correlation matrix - 2

The partial correlation of 0.2463 between the square of "highest year of school completed" [SQEDUC=EDUC²] and "occupational prestige score" [prestg80] controlling for "highest year of school completed" [educ] was higher than the other partial correlations, and should be included in the analysis.

Page 50: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 50

Problem 3 - Answer

The relationship between the independent and dependent variables is linear, and the square transformation supports a statistically significant addition to the relationship. The answer to the question is true.

Page 51: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 51

Other problems on assumption of linearity

A problem may ask about the assumption of linearity for a nominal level variable. The answer will be “An incorrect application of a statistic” since linearity does not apply to nominal variables.

A problem may ask about the assumption of linearity for an ordinal level variable. If the variable or transformed variable is linear, the correct answer to the question is “True with caution” since we may be required to defend treating an ordinal variable as metric.

Questions will specify a level of significance to use in testing the correlations and partial correlations.

Page 52: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 52

Steps in answering questions about the assumption of linearity – question 1

Question: Is relationship between dependent variable and independent variable linear?

Correlation for untransformed variables statistically significant?

No

Yes

Incorrect application of a statistic

Yes

NoAre all of the variables to be evaluated metric?

Either variable ordinal level?

Yes True (linear)

True with caution(linear)

Correlation for transformed variables statistically significant?

Yes

False (non-linear)

No

No

Page 53: SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption

SW388R7Data Analysis

& Computers II

Slide 53

Steps in answering questions about the assumption of linearity – question 2

Question: Is relationship between dependent variable and independent variable linear, but improvable?

Correlation for untransformed variables statistically significant?No

Incorrect application of a statistic

Yes

NoAre all of the variables to be evaluated metric?

Correlation for transformed variables statistically significant?Yes

False (non-linear)

Correlation for transformed variables statistically significant?

False (weak linear,

not improvable)

No

Yes

No

False (linear,

not improvable)Yes

True/True with caution (linear,

improvable)