sw388r7 data analysis & computers ii slide 1 assumption of linearity strategy for solving...
TRANSCRIPT
SW388R7Data Analysis
& Computers II
Slide 1
Assumption of linearity
Assumption of linearity
Strategy for solving problems
Producing outputs for evaluating linearity
Assumption of linearity script
Sample Problems
SW388R7Data Analysis
& Computers II
Slide 2
Assumption of linearity
The statistics that we will study this semester generally assume that the relationship between variables is linear, or they perform better if the relationships are linear.
If a relationship is nonlinear, the statistics which assume it is linear will underestimate the strength of the relationship, or fail to detect the existence of a relationship.
SW388R7Data Analysis
& Computers II
Slide 3
Linearity
Linearity means that the amount of change, or rate of change, between scores on two variables is constant for the entire range of scores for the variables.
There are relationships are not linear. The relationship between learning and time
may not be linear. Learning a new subject shows rapid gains at first, then the pace slows down over time. This is often referred to a a learning curve.
Population growth may not be linear. The pattern often shows growth at increasing rates over time.
SW388R7Data Analysis
& Computers II
Slide 4
Year
1990
1980
1970
1960
1950
1940
1930
1920
1910
1900
1890
1880
1870
1860
1850
Pop
ulat
ion
20,000,000
18,000,000
16,000,000
14,000,000
12,000,000
10,000,000
8,000,000
6,000,000
4,000,000
2,000,000
0
Population growth in Texas
The increase in population for the ten years from 1860 to 1870 is relatively small compared to the increase in the population for the ten years from 1960 to 1970.
A difference of 214,364.
A difference of 1,617,053.
SW388R7Data Analysis
& Computers II
Slide 5
Evaluating linearity
There are both graphical and statistical methods for evaluating linearity.
Graphical methods include the examination of scatter plots, often overlaid with a trend line. While commonly recommended, this strategy is difficult to interpret.
Statistical methods include diagnostic hypothesis tests for linearity, a rule of thumb that says a relationship is linear if the difference between the linear correlation coefficient (r) and the nonlinear correlation coefficient (eta) is small, and examining patterns of correlation coefficients.
SW388R7Data Analysis
& Computers II
Slide 6
RESPONDENT'S SOCIOECONOMIC INDEX
100806040200
RS
OC
CU
PA
TIO
NA
L P
RE
ST
IGE
SC
OR
E
(198
0)
90
80
70
60
50
40
30
20
10
Interpreting scatter plots
The advice for interpreting linearity is often phrased as looking for a cigar-shaped band, which is very evident in this plot.
SW388R7Data Analysis
& Computers II
Slide 7
Gross domestic product / capita
3000020000100000-10000
Infa
nt m
orta
lity
(dea
ths
per
1000
live
birt
hs)
200
100
0
-100
Interpreting scatter plots
Sometimes, a scatter plot shows a clearly nonlinear pattern that requires transformation, like the one shown in the scatter plot.
SW388R7Data Analysis
& Computers II
Slide 8
Scatter plots that are difficult to interpret
AGE OF RESPONDENT
8070605040302010
TO
TA
L T
IME
SP
EN
T O
N T
HE
IN
TE
RN
ET
120
100
80
60
40
20
0
-20
HOURS PER DAY WATCHING TV
1614121086420-2
TO
TA
L T
IME
SP
EN
T O
N T
HE
IN
TE
RN
ET
120
100
80
60
40
20
0
-20
The correlations for both of these relationships are low.
The linearity of the relationship on the right can be improved with a transformation; the plot on the left cannot. However, this is not necessarily obvious from the scatter plots.
SW388R7Data Analysis
& Computers II
Slide 9
Using correlation matrices
Correlations
1 .017 .048 -.009 .032 .079
. .874 .648 .931 .761 .453
93 93 93 93 93 93
.017 1 .979** .983** .995** .916**
.874 . .000 .000 .000 .000
93 270 270 270 270 270
.048 .979** 1 .926** .994** .978**
.648 .000 . .000 .000 .000
93 270 270 270 270 270
-.009 .983** .926** 1 .960** .832**
.931 .000 .000 . .000 .000
93 270 270 270 270 270
.032 .995** .994** .960** 1 .951**
.761 .000 .000 .000 . .000
93 270 270 270 270 270
.079 .916** .978** .832** .951** 1
.453 .000 .000 .000 .000 .
93 270 270 270 270 270
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
TOTAL TIME SPENT ONTHE INTERNET
AGE OF RESPONDENT
Logarithm of AGE[LG10(AGE)]
Square of AGE [(AGE)**2]
Square Root of AGE[SQRT(AGE)]
Inverse of AGE [-1/(AGE)]
TOTAL TIMESPENT ON
THEINTERNET
AGE OFRESPON
DENT
Logarithm ofAGE
[LG10(AGE)]
Square ofAGE
[(AGE)**2]
Square Rootof AGE
[SQRT(AGE)]
Inverse ofAGE
[-1/(AGE)]
Correlation is significant at the 0.01 level (2-tailed).**.
Creating a correlation matrix for the dependent variable and the original and transformed variations of the independent variable provides us with a pattern that is easier to interpret.
The information that we need is in the first column of the matrix which shows the correlation and significance for the dependent variable and all forms of the independent variable.
SW388R7Data Analysis
& Computers II
Slide 10
The pattern of correlations for no relationship
Correlations
1 .017 .048 -.009 .032 .079
. .874 .648 .931 .761 .453
93 93 93 93 93 93
.017 1 .979** .983** .995** .916**
.874 . .000 .000 .000 .000
93 270 270 270 270 270
.048 .979** 1 .926** .994** .978**
.648 .000 . .000 .000 .000
93 270 270 270 270 270
-.009 .983** .926** 1 .960** .832**
.931 .000 .000 . .000 .000
93 270 270 270 270 270
.032 .995** .994** .960** 1 .951**
.761 .000 .000 .000 . .000
93 270 270 270 270 270
.079 .916** .978** .832** .951** 1
.453 .000 .000 .000 .000 .
93 270 270 270 270 270
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
TOTAL TIME SPENT ONTHE INTERNET
AGE OF RESPONDENT
Logarithm of AGE[LG10(AGE)]
Square of AGE [(AGE)**2]
Square Root of AGE[SQRT(AGE)]
Inverse of AGE [-1/(AGE)]
TOTAL TIMESPENT ON
THEINTERNET
AGE OFRESPON
DENT
Logarithm ofAGE
[LG10(AGE)]
Square ofAGE
[(AGE)**2]
Square Rootof AGE
[SQRT(AGE)]
Inverse ofAGE
[-1/(AGE)]
Correlation is significant at the 0.01 level (2-tailed).**.
The correlation between the two variables is very weak and statistically non-significant. If we viewed this as a hypothesis test for the significance of r, we would conclude that there is no relationship between these variables.
Moreover, none of significance tests for the correlations with the transformed dependent variable are statistically significant. There is no relationship between these variables; it is not a problem with non-linearity.
SW388R7Data Analysis
& Computers II
Slide 11
Correlation pattern suggesting transformation
Correlations
1 .215 .104 .328** .156 .045
. .079 .397 .006 .203 .713
93 68 68 68 68 68
.215 1 .874** .903** .967** .626**
.079 . .000 .000 .000 .000
68 160 160 160 160 160
.104 .874** 1 .611** .967** .910**
.397 .000 . .000 .000 .000
68 160 160 160 160 160
.328** .903** .611** 1 .774** .335**
.006 .000 .000 . .000 .000
68 160 160 160 160 160
.156 .967** .967** .774** 1 .784**
.203 .000 .000 .000 . .000
68 160 160 160 160 160
.045 .626** .910** .335** .784** 1
.713 .000 .000 .000 .000 .
68 160 160 160 160 160
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
TOTAL TIME SPENT ONTHE INTERNET
HOURS PER DAYWATCHING TV
Logarithm of TVHOURS[LG10( 1+TVHOURS)]
Square of TVHOURS[(TVHOURS)**2]
Square Root ofTVHOURS [SQRT(1+TVHOURS)]
Inverse of TVHOURS[-1/( 1+TVHOURS)]
TOTAL TIMESPENT ON
THEINTERNET
HOURS PERDAY
WATCHINGTV
Logarithm ofTVHOURS
[LG10(1+TVHOUR
S)]
Square ofTVHOURS[(TVHOUR
S)**2]
Square Rootof TVHOURS
[SQRT(1+TVHOUR
S)]
Inverse ofTVHOUR
S [-1/(1+TVHOU
RS)]
Correlation is significant at the 0.01 level (2-tailed).**.
The correlation between the two variables is very weak and statistically non-significant. If we viewed this as a hypothesis test for the significance of r, we would conclude that there is no relationship between these variables.
However, the probability associated with the larger correlation for the square transformation is statistically significant, suggesting that this is a transformation we might want to use in our analysis.
SW388R7Data Analysis
& Computers II
Slide 12
Transformations
When a relationship is not linear, we can transform one or both variables to achieve a relationship that is linear.
Four common transformations to induce linearity are: the logarithmic transformation, the square root transformation, the inverse transformation and the square transformation.
All of these transformations produce a new variable that is mathematically equivalent to the original variable, but expressed in different measurement units, e.g. logarithmic units instead of decimal units.
SW388R7Data Analysis
& Computers II
Slide 13
When transformations do not work
When none of the transformations induces linearity in a relationship, our statistical analysis will underestimate the presence and strength of the relationship, i.e. we lose power to detect relationship and estimated values of the dependent variable based on our analysis may be biased or systematically incorrect.
We do have the option of changing the way the information in the variables are represented, e.g. substitute several dichotomous variables for a single metric variable. This bypasses the assumption of linearity while still attempting to incorporate the information about the relationship in the analysis.
SW388R7Data Analysis
& Computers II
Slide 14
Strategy for solving problems - 1
Our strategy for determining whether or not a relationship is linear will be based on significance tests for the Pearson r correlation coefficient and tests of partial correlations between transformed variables and the dependent variable, controlling for the correlation between the independent variable and the dependent variable.
If the correlation coefficient between an independent variable and a dependent variable is statistically significant (its probability is less than or equal to a specified level of significance), we will conclude that the relationship is linear.
SW388R7Data Analysis
& Computers II
Slide 15
Strategy for solving problems - 2
If linearity cannot be supported for the untransformed independent and dependent variables, we will examine the transformations for the variables.
If any of the transformations for the independent or dependent variable are statistically significant when the untransformed relationship is not statistically significant, we will conclude that the problem is non-linearity, and can be remedied by substituting the transformed variable in the analysis.
If neither the untransformed variable nor any of the transformations are statistically significant, we will conclude that there is no relationship between the variables. We do not conclude that the relationship is not linear.
SW388R7Data Analysis
& Computers II
Slide 16
Strategy for solving problems - 3
Even when relationship with the original independent variable is linear, the analysis might still be enhanced by the inclusion of a transformed version of the independent variable to the analysis, e.g. including the square of the independent variable in a regression.
If the partial correlation for a transformation is statistically significant controlling for the relationship between the original independent and depending variables, we will suggest that the transformed variable be included in the analysis, in addition to the original form of the variables. In effect, we are adding the relationship of the transformation to the linear relationship between the independent and dependent variable.
SW388R7Data Analysis
& Computers II
Slide 17
Problem 1
SW388R7Data Analysis
& Computers II
Slide 18
Creating the scatter plot
The most commonly recommended strategy for evaluating linearity is visual examination of a scatter plot.
To obtain a scatter plot in SPSS, select the Scatter… command from the Graphs menu.
SW388R7Data Analysis
& Computers II
Slide 19
Selecting the type of scatter plot
First, click on thumbnail sketch of a simple scatter plot to highlight it.
Second, click on the Define button to specify the variables to be included in the scatter plot.
SW388R7Data Analysis
& Computers II
Slide 20
Selecting the variables
First, move the dependent variable netime to the Y Axis text box.
Second, move the independent variable tvhours to the X axis text box.
If a problem statement mentions a relationship between two variables without clearly indicating which is the independent variable and which is the dependent variable, the first mentioned variable is taken to the be independent variable.
Third, click on the OK button to complete the specifications for the scatter plot.
SW388R7Data Analysis
& Computers II
Slide 21
The scatter plot
The scatter plot is produced in the SPSS output viewer.
The points in a scatter plot are considered linear if they form a cigar-shaped elliptical band.
The pattern in this scatter plot is not really clear.
SW388R7Data Analysis
& Computers II
Slide 22
Adding a trend line
To try to determine if the relationship is linear, we can add a trend line to the chart.
To add a trend line to the chart, we need to open the chart for editing.
To open the chart for editing, double click on it.
SW388R7Data Analysis
& Computers II
Slide 23
The scatter plot in the SPSS Chart Editor
The chart that we double clicked on is opened for editing in the SPSS Chart Editor.
The blue border around the plot area indicates that the plot area is selected.
The icon for adding a trend or fit line to the chart is disabled, so we cannot select it.
SW388R7Data Analysis
& Computers II
Slide 24
Enabling the Add fit line icon
To activate the Add fit line icon, click on one of the points.
When the points are selected, they are bordered in blue.
When the points are selected, their Properties dialog opens. We could use this dialog to change the marker, color, etc.
SW388R7Data Analysis
& Computers II
Slide 25
Requesting the fit line
With the points selected, click on the Add fit line icon.
SW388R7Data Analysis
& Computers II
Slide 26
The fit line and r²
The linear trend or fit line is added to the chart and the Properties dialog for the fit line is opened.
By default, the trend line is linear, and the value for R Square is included in the chart.
The value of r² (0.046) suggests that the relationship is weak.
SW388R7Data Analysis
& Computers II
Slide 27
Changing the shape of the fit line
We can try a trend line with a curved shape to see if it does a better job of fitting the data.
To change the trend line,
First, click on the Quadratic in the Fit Method panel. This will fit a trend line that include a square term in the equation (x²).
Second, click on the apply button to change the trend line.
SW388R7Data Analysis
& Computers II
Slide 28
The quadratic fit line and r²
The fit line curves to reduce the discrepancies between the line and the data points.
The value of r² (0.159) falls at the top of the weak range, indicating a stronger relationship that the one represented by the linear fit line.
This result hints that a squared transformation of the independent variable may be needed.
SW388R7Data Analysis
& Computers II
Slide 29
Changing the color of the fit line
Click the line panel to select a color.
Click on the Apply button, then the color of the trend line will change.
Select a color which you want to change.
SW388R7Data Analysis
& Computers II
Slide 30
Computing the transformations
There are four transformations that we can use to achieve or improve linearity.
The compute dialogs for these four transformations for linearity are shown.
SW388R7Data Analysis
& Computers II
Slide 31
Creating the scatter plot matrix
To create the scatter plot matrix, select the Scatter… command in the Graphs menu.
SW388R7Data Analysis
& Computers II
Slide 32
Selecting type of scatterplot
First, click on the Matrix thumbnail sketch to indicate which type of scatterplot we want.
Second, click on the Define button to select the variables for the scatterplot.
SW388R7Data Analysis
& Computers II
Slide 33
Specifications for scatterplot matrix
First, move the dependent variable, the independent variable and all of the transformations to the Matrix Variables list box.
Second, click on the OK button to produce the scatterplot.
SW388R7Data Analysis
& Computers II
Slide 34
The scatter plot matrix
The scatter plot matrix shows a thumbnail sketch of scatter plots for each independent variable or transformation with the dependent variable. The scatter plot matrix may suggest which transformations might be useful.
SW388R7Data Analysis
& Computers II
Slide 35
Creating the correlation matrix
To create the correlation matrix, select the Correlate | Bivariate… command in the Analyze menu.
SW388R7Data Analysis
& Computers II
Slide 36
Specifications for correlation matrix
First, move the dependent variable, the independent variable and all of the transformations to the Variables list box.
Second, click on the OK button to produce the correlation matrix.
SW388R7Data Analysis
& Computers II
Slide 37
Cor re lations
1 .215 .104 .328** .156 .045
. .079 .397 .006 .203 .713
93 68 68 68 68 68
.215 1 .874** .903** .967** .626**
.079 . .000 .000 .000 .000
68 160 160 160 160 160
.104 .874** 1 .611** .967** .910**
.397 .000 . .000 .000 .000
68 160 160 160 160 160
.328** .903** .611** 1 .774** .335**
.006 .000 .000 . .000 .000
68 160 160 160 160 160
.156 .967** .967** .774** 1 .784**
.203 .000 .000 .000 . .000
68 160 160 160 160 160
.045 .626** .910** .335** .784** 1
.713 .000 .000 .000 .000 .
68 160 160 160 160 160
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
TOTAL TIME SPENT ONTHE INTERNET
HOURS PER DAYWATCHING TV
Logarithm of TV HOURS[LG10( 1+TVHOURS)]
Square of TVHOURS[(TVHOURS)**2]
Square Root ofTVHOURS [SQRT(1+TVHOURS) ]
Inverse of TV HOURS[-1/( 1+TVHOURS)]
TOTAL TIMESPENT ON
THEINTERNET
HOURS PERDAY
WATCHINGTV
Logarithm ofTVHOURS
[LG10(1+TVHOUR
S)]
Square ofTVHOURS[(TVHOUR
S)**2]
Square Rootof TV HOURS
[SQRT(1+TVHOUR
S)]
Inverse ofTVHOUR
S [-1/(1+TVHOU
RS)]
Correlation is signif icant at the 0.01 level (2-tailed).**.
The correlation matrix and the original problem
The output from the script can be used to answer the problem question. The significance of the correlation coefficient between the untransformed variables (0.079) is not significant, suggesting either that the relationship is non-linear relationship (if one of the transformations is significant) or weak (if all of the correlations are not significant).
The correlation between the dependent variable and the square transformation (0.006) is less than the level of significance. The relationship between hours watching TV and time spent on the Internet is not linear, and the transformed variable should be substituted for hours watching TV.
SW388R7Data Analysis
& Computers II
Slide 38
Problem 1 - Answer
The answer is false because the relationship is not-linear, and the transformation should be substituted for the independent variable, not added to the analysis.
SW388R7Data Analysis
& Computers II
Slide 39
The script for testing assumption of linearity
The SPSS script can be used to test the assumption of linearity.
First, move the dependent and independent variables to the list boxes.
Second, mark the Assumption of linearity option button. Third, accept or mark the
transformation to be included in the analysis.
Fourth, click on the OK button to produce the output.
SW388R7Data Analysis
& Computers II
Slide 40
The scatter plot matrix produced by the script
The scatter plot matrix provides a thumbnail sketch of each of the relationships. While we will base our answers on the correlation matrix, the scatter plot matrix should provide visual confirmation of our conclusions.
SW388R7Data Analysis
& Computers II
Slide 41
The correlation matrix produced by the script
The answers to this problem is based on the correlation matrix.
With a non-significant relationship between the untransformed independent variable and the dependent variable, combined with a significant relationship with a transformed variable, we conclude that the relationship is not linear.
SW388R7Data Analysis
& Computers II
Slide 42
Partial correlations produced by the script
Had we concluded that the relationship between the independent and dependent variable been judged to be linear, we would have asked the question of whether or not the analysis could be improved by the addition of a transformed variable.
We answer this second question by examining the statistical significance of the partial correlations, controlling for the linear relationship between the dependent and independent variable.
SW388R7Data Analysis
& Computers II
Slide 43
Problem 2
SW388R7Data Analysis
& Computers II
Slide 44
Correlations
1 .082 -.103 .090 -.091 -.135
. .486 .379 .444 .438 .247
93 75 75 75 75 75
.082 1 -.871** .965** -.981** -.361**
.486 . .000 .000 .000 .000
75 176 176 176 176 176
-.103 -.871** 1 -.940** .946** .743**
.379 .000 . .000 .000 .000
75 176 176 176 176 176
.090 .965** -.940** 1 -.993** -.475**
.444 .000 .000 . .000 .000
75 176 176 176 176 176
-.091 -.981** .946** -.993** 1 .502**
.438 .000 .000 .000 . .000
75 176 176 176 176 176
-.135 -.361** .743** -.475** .502** 1
.247 .000 .000 .000 .000 .
75 176 176 176 176 176
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
TOTAL TIME SPENTON THE INTERNET
NUMBER OF HOURSWORKED LAST WEEK
Logarithm of HRS1[LG10( 81-HRS1)]
Square of HRS1[(HRS1)**2]
Square Root of HRS1[SQRT( 81-HRS1)]
Inverse of HRS1 [-1/(81-HRS1)]
TOTAL TIMESPENT ON
THEINTERNET
NUMBER OFHOURS
WORKEDLAST WEEK
Logarithm ofHRS1 [LG10(
81-HRS1)]
Square ofHRS1
[(HRS1)**2]
Square Rootof HRS1[SQRT(
81-HRS1)]
Inverse ofHRS1 [-1/(81-HRS1)]
Correlation is significant at the 0.01 level (2-tailed).**.
The correlation matrix
The probability associated with the correlation coefficient between "number of hours worked in the past week" and "total hours spent on the Internet" (0.486) is greater than the level of significance, suggesting either that the relationship is non-linear relationship (if one of the transformations is significant) or weak (if all of the correlations are not significant).
The lack of statistical significance for all of the transformations suggests that there is a weak relationship between "number of hours worked in the past week" and "total hours spent on the Internet", and the lack of relationship is not attributable to non-linearity.
SW388R7Data Analysis
& Computers II
Slide 45
Problem 2 - Answer
Without any evidence that there is a non-linear relationship, the answer to the question is true. There is a weak or very weak linear relationship.
SW388R7Data Analysis
& Computers II
Slide 46
Problem 3
SW388R7Data Analysis
& Computers II
Slide 47
The correlation matrix
The correlation between "highest year of school completed" and "occupational prestige score" was statistically significant (r=.495, p<0.001). A linear relationship exists between these variables.
SW388R7Data Analysis
& Computers II
Slide 48
The partial correlation matrix - 1
Controlling for "highest year of school completed", the partial correlation for several of the transformations indicated a statistically significant relationship to "occupational prestige score": the logarithmic transformation (r=-0.254, p<0.001); the square root transformation (r=-0.257, p<0.001); the inverse transformation (r=-0.232, p<0.001); and the square transformation (r=0.246, p<0.001).
SW388R7Data Analysis
& Computers II
Slide 49
The partial correlation matrix - 2
The partial correlation of 0.2463 between the square of "highest year of school completed" [SQEDUC=EDUC²] and "occupational prestige score" [prestg80] controlling for "highest year of school completed" [educ] was higher than the other partial correlations, and should be included in the analysis.
SW388R7Data Analysis
& Computers II
Slide 50
Problem 3 - Answer
The relationship between the independent and dependent variables is linear, and the square transformation supports a statistically significant addition to the relationship. The answer to the question is true.
SW388R7Data Analysis
& Computers II
Slide 51
Other problems on assumption of linearity
A problem may ask about the assumption of linearity for a nominal level variable. The answer will be “An incorrect application of a statistic” since linearity does not apply to nominal variables.
A problem may ask about the assumption of linearity for an ordinal level variable. If the variable or transformed variable is linear, the correct answer to the question is “True with caution” since we may be required to defend treating an ordinal variable as metric.
Questions will specify a level of significance to use in testing the correlations and partial correlations.
SW388R7Data Analysis
& Computers II
Slide 52
Steps in answering questions about the assumption of linearity – question 1
Question: Is relationship between dependent variable and independent variable linear?
Correlation for untransformed variables statistically significant?
No
Yes
Incorrect application of a statistic
Yes
NoAre all of the variables to be evaluated metric?
Either variable ordinal level?
Yes True (linear)
True with caution(linear)
Correlation for transformed variables statistically significant?
Yes
False (non-linear)
No
No
SW388R7Data Analysis
& Computers II
Slide 53
Steps in answering questions about the assumption of linearity – question 2
Question: Is relationship between dependent variable and independent variable linear, but improvable?
Correlation for untransformed variables statistically significant?No
Incorrect application of a statistic
Yes
NoAre all of the variables to be evaluated metric?
Correlation for transformed variables statistically significant?Yes
False (non-linear)
Correlation for transformed variables statistically significant?
False (weak linear,
not improvable)
No
Yes
No
False (linear,
not improvable)Yes
True/True with caution (linear,
improvable)