statistics 252 · 2017. 7. 26. · the spss icon allows resizing, minimizing, closing, etc. of...

1

STATISTICS 252 INTRODUCTION TO APPLIED STATISTICS II

SPSS LAB MANUAL

Science Department Grant MacEwan University

Contributions by: Dr. Kathleen Lawry-Batty, Dr. Christina Anton, Allan Wesley, Dr. Muhammad Islam, Dr. Karen Buro, and Dr. Wanhua Su

Student Name:_________________________________________________ Student I.D:____________________________________________________ Lab Section Day:__________________________________________________________ Time:_________________________________________________________ Room:________________________________________________________ Lab Instructor Name:________________________________________________________ Office:________________________________________________________ Email:_________________________________________________________ Telephone:_____________________________________________________

2

Table of Contents Introduction .................................................................................................................................................. 4

Chapter 1 Introduction to SPSS ..................................................................................................................... 5

1.1 Starting SPSS ................................................................................................................................. 5

1.2 The SPSS Environment .................................................................................................................. 5

1.2.1 Title Bar ................................................................................................................................. 6

1.2.2 Windows in SPSS ................................................................................................................... 6

1.2.4 Data Editor Menu Bar ........................................................................................................... 6

1.2.5 Data Editor Toolbar .............................................................................................................. 6

1.3 Entering Data and Defining Variables ........................................................................................... 7

1.3.1 Defining Variables ................................................................................................................. 7

1.3.2 Entering and Editing Data ..................................................................................................... 8

1.4 Saving and Reading Data Files..................................................................................................... 10

1.4.1 Saving data File ................................................................................................................... 10

1.4.2 Reading Existing Data Files .................................................................................................. 10

1.5 Manipulating Data...................................................................................................................... 11

1.5.1 Creating a New Variable...................................................................................................... 11

1.5.2 Recoding Variables .............................................................................................................. 12

1.5.3 Selecting Cases ................................................................................................................... 14

1.5.4 Sorting Data ........................................................................................................................ 15

1.6 Drawing a Graph ......................................................................................................................... 16

1.7 Computation of Numerical Summaries ...................................................................................... 18

1.7.1 Frequencies ......................................................................................................................... 19

1.7.2 Descriptives ......................................................................................................................... 20

1.7.3 Explore ................................................................................................................................ 21

1.8 Saving Your Work ........................................................................................................................ 23

1.8.1 Saving Results ...................................................................................................................... 23

1.8.2 Transferring Output into Word ........................................................................................... 23

1.9 Printing Your Output ................................................................................................................... 23

Chapter 2 Parametric Procedures ............................................................................................................... 24

2.1 Inference About One Population Mean Using the One Sample t Procedure ............................. 24

2.3 Inferences about Two Population Means Using the Paired t Procedure.......................................... 32

Chapter 3 ANOVA ........................................................................................................................................ 41

3.1 The Analysis of Variance F-test for Equality of k Population Means ................................................ 41

3.2 Linear Combinations and Multiple Comparisons of Means (More) ................................................. 49

3

3.3 Randomized Block Designs ............................................................................................................... 55

3.4 2-Way ANOVA ............................................................................................................................. 61

Chapter 4 Non-Parametric Statistics ........................................................................................................... 72

4.1 Wilcoxon (Mann-Whitney) Rank Sum Test ................................................................................. 72

4.2 Inferences About Two Population Medians Using Wilcoxon’s Signed Rank Tests ..................... 77

4.3 The Kruskal-Wallis Test ............................................................................................................... 81

Chapter 5 Simple Linear Regression ........................................................................................................... 86

5.1 Linear Regression Model ............................................................................................................. 86

5.2 Residual Analysis ......................................................................................................................... 91

Chapter 6 Multiple Linear Regression ......................................................................................................... 95

6.1 The Multiple Regression Model .................................................................................................. 95

6.2 Dummy Variables in Regression Analysis.................................................................................. 105

6.3 Selecting the Best Regression Equation .................................................................................... 108

4

Introduction

This manual covers the basic instructions for the computer lab component of Statistics 252. It is to be used in conjunction with SPSS (Version 18.0) for Windows XP (or higher versions) and the corresponding textbook in STAT 252. It is written for those who have no previous computer experience.

The purpose of the computer lab is to familiarize students in the use of a statistical software package, provide them with extra practice in the interpretation of statistical analyses, as well as demonstrate some interesting applications of statistics. SPSS, which stands for Statistical Package for the Social Sciences, is a powerful, easy-to-use, statistical software package that provides a wide range of basic and advanced data analysis capabilities. SPSS's straightforward command structure makes it accessible to users with a wide variety of backgrounds and experience.

On one hand we find that statistics is a highly technical subject, with complex formulas and equations that seem to be written almost entirely in Greek, but on the other hand we find that statistics is interesting and relevant because it provides the means for using data to gain insight into real-world problems. To really understand the material presented in this course, you must get involved. Throughout the manual we will illustrate how to do particular types of analyzes step by step. In the lab assignments you will work out similar analyses. Through this process you will learn statistics by doing statistics. To maximize your learning, we recommend that you read the text and simultaneously follow along on your computer.

5

Chapter 1 Introduction to SPSS In this chapter the reader will be introduced to SPSS. After studying this section you should be able to:

1. Start SPSS 2. Use different SPSS windows 3. Enter, edit, and manipulate variables and cases 4. Display numerical summaries 5. Create and edit graphs 6. Save, export and print your results 7. Exit SPSS

1.1 Starting SPSS After you log onto your computer double click on the SPSS icon on your computer’s desktop. Alternatively, from the taskbar on your desktop:

1. Click on the Start button 2. Click on All Programs 3. Click on the SPSS Inc. Menu 4. Click on PASW Statistics 18.0

1.2 The SPSS Environment After starting SPSS, a window as in Figure 1.2 opens. Pictured is the default configuration for SPSS 18.0.

Figure 1.2: SPSS Data Editor Window

6

The SPSS environment consists of different windows and bars. As you perform data analysis, you will work with those bars and windows. Here is a brief description of the different parts in the SPSS environment.

1.2.1 Title Bar

At the top of the main window is the title bar, which shows the SPSS icon and three window buttons. The SPSS icon allows resizing, minimizing, closing, etc. of SPSS. The window control buttons have similar functions.

1.2.2 Windows in SPSS

The four most important windows in SPSS are: 1. Data Editor: opens automatically when you start a SPSS session, and displays the contents of

the current data file 2. Viewer: opens automatically the first time you run a procedure that generates output, and

displays the results of the statistical procedures 3. Chart Editor: opens only after SPSS produces a plot or diagram, and is used for editing 4. Syntax Editor: is used if you wish to run SPSS commands instead of clicking the pull-down

menus. Each window has its own menu and toolbar. The Analyze and Graphs menus are available in all windows, so that new output can be generated without switching windows. To activate a window, click on the edge of the desired window, or select the window from the Window menu.

1.2.4 Data Editor Menu Bar The Menu bar is the second horizontal line from the top. It provides easy access to most SPSS features, and it contains twelve drop-down menus:

Figure 1.3: Data Editor Menu Bar

1.2.5 Data Editor Toolbar

Beneath the menu bar is the toolbar which provides shortcuts for several important actions. When you click on a button SPSS performs an action or opens a dialog box corresponding to the menu command. If you place the mouse pointer over a button, without clicking, SPSS displays a brief description of the tool in the Status Bar.

Figure 1.4: Data Editor Toolbar

7

1.3 Entering Data and Defining Variables Data can be manually entered into the Data Editor window. The main components of this window are displayed in Figure 1.2. The following example illustrates data entry and modification, descriptive statistics and graphical summaries: + Example 1. The data you are about to analyze have been collected from an exam given to 40 students in an introductory statistics course. Half of the students in the class were female, and the other half were male. Two variables were measured in the students:

1. Marks (Mark of the student, 40-90) 2. Sex (Sex of the student M-Male, F-Female)

The following data are used to illustrate data entry and modification, descriptive statistics and graphical summaries: Marks of the female students: 85, 83, 56, 98, 72, 52, 88, 75, 91, 69, 78, 64, 78, 81, 74, 73, 90, 75, 65, 55 Marks of the male students: 40, 47, 50, 52, 58, 61, 62, 63, 64, 67, 70, 72, 74, 75, 78, 80, 81, 82, 90, 92

1.3.1 Defining Variables

To define a variable

1. Click on the Variable View tab at the bottom of the Data Editor window (see Figure 1.2) 2. Enter a new variable name in the column Name and press the Enter key on the keyboard.

Variable names must begin with a letter and cannot end with a period. After entering the name, default values (Type, Width, ....) are assigned. To manually select the data type, click on the corresponding cell in the column Type.

Figure 1.5: Selecting the Type

8

Figure 1.6: Defining the Values

In our example we have two variables: Marks (numeric) and Sex (categorical). For the categorical variable click on the radio button for String in the Variable Type dialog box (see Figure 1.5). To define possible values of the variable Sex (M for male and F for female), click on the Value cell in the row for the variable, enter each value with the corresponding label, and then click on the add button (see Figure 1.6). To delete a variable, select the corresponding row, and press Delete key on your keyboard, or click on Edit>Clear. To insert a new variable between existing variables:

1. Click on the row below the place you wish to insert the variable 2. Click on Data>Insert variable

1.3.2 Entering and Editing Data

To enter data switch from Variables View to Data View. The Data View window is a grid with rows corresponding to subjects (or cases), and columns corresponding to variables (Marks and Sex in our case). The cells in the spreadsheet contain only data values, they cannot contain formulas. Enter the values for all cases from example 1. Data values are not recorded until you press Enter or select another cell.

9

Figure 1.7: Entering Data

1. To correct a value in a cell:

(a) Click on the cell, type the correct value, and press Enter. (b) To change a portion of the cell contents, double-click on the cell

use the arrow, Backspace and Delete keys to make the changes. (c) To delete the values in a range, highlight the area and press Delete. (d) To undo a change use Edit> Undo.

2. To insert a new case (row)

(a) Select the row below the row where you wish to insert (b) Click on Edit>Insert case

3. To delete a row

(a) Click on the case number (b) Click on Edit>Clear

4. To copy data using copy and paste commands

(a) Highlight the data to be copied. (b) Choose Edit >Copy (c) Select the place where the cells are to be pasted (change the columns type if necessary) (d) Click on Edit>Paste

10

5. To change the width of one or more columns

(a) Highlight the column(s). (b) With your mouse, point to the line dividing a selected column from another column. The cursor becomes a two-sided arrow. (c) Drag the border until the columns have the desired width.

1.4 Saving and Reading Data Files It is strongly recommended that you save your work regularly. This will prevent you from having to re-enter data should the computer crash.

1.4.1 Saving data File

To save a new SPSS data file make the Data Editor the active window and follow the steps:

1. Select File>Save As 2. In the Save in box select your network drive or the name of your USB flash drive. 3. In file Name box, type lab1 and click on Save.

The current data file will now be saved as lab1.sav, where sav is the file extension. For future saves (to overwrite the old version of the current file with the new version), simply use File>Save or use the ctrl-s keys. To save a data file as an ASCII file, in the Save As type box you have to choose Tab-delimited (*.dat). To practice, save the data from example 1 as lab1.dat.

Figure 1.8: Saving Data

1.4.2 Reading Existing Data Files

SPSS can read different types of data files.

1. To read an SPSS data file

(a) Select File> Open>Data (b) Click on the data file you wish to open SPSS data files have the extension .sav, and they contain not only the data, but also the variables names and formats.

11

2. To read a text file

(a) Select File>New>Data (b) Select File>Read Text data (c) In the Files of Type box choose the right extension (usually .dat or .txt) (d) In the file name box choose the appropriate path, and click on the text file you wish to open (e) Transfer the data from the text file to the Data Editor window using the Text Import Wizard

To practice open a new SPSS Data Editor Window and transfer the data from lab1.dat file.

1.5 Manipulating Data

1.5.1 Creating a New Variable

New variables can be created using Transform > Compute in the Data Editor menu. A dialog box will appear; choose/type the variable you want and the column where you want the created entries to be stored. Answer the questions (or boxes) appropriately and click OK. The if... dialog box allows you to apply data transformations to selected subsets of cases. For example, to create the variable Y using the formula: Y = 4*Marks:

1. Choose Transform > Compute 2. Type Y in the Target Variable box 3. Type 4*marks in the Numeric Expression box and click OK.

Your screen should look as in Figure 1.9.

Figure 1.9: Compute Variable Dialog Box

Observe that there is a ruler icon for the numeric variables (in our example for the variable marks) and a balloon icon for the string variables (e.g. for the variable sex).

12

1.5.2 Recoding Variables

You may recode the values of a variable into the same variable, or into a new variable formed with the recoded values. The preferred method is to use a new variable in order to preserve the original values. Recoding into the same variable Suppose that the categorical variable sex needs to be recoded as a numerical variable (some tools will not work with categorical data), by assigning F=0, and M=1. To recode into the same variable:

Figure 1.10: Recoding into the same variable

1. Select Transform > Recode into Same Variable 2. Double click on the variable you want to recode (Sex) 3. Select Old and New Values 4. Enter the old and the new values and click on the Add button 5. Select Continue to close the Old and New Values dialog box, when you have indicated all the recode instructions 6. Click on OK to have SPSS execute your instructions

Figure 1.11: Selecting the Old and the New Values

13

In the Data Editor you find that sex is now expressed by one of two integers 0 or 1. Click on the Variable View tab and change its type to numeric. Recoding into a different variable Referring to our example, we will define a new variable Grade with the values in table 1.1. To recode into the new variable:

Grade Marks

A+ 95-100

A 90-94 A- 85-89

B+ 80-84

B 75-79

B- 70-74

C+ 65-69

C 60-64

C- 55-59

D+ 50-54

D 45-49

F 0-44

Table 1.1: The variable Grades

1. Select Transform > Recode Into Different Variables 2. Double click on the variable you want to recode (marks) and write the name and the label of the

new variable (Grades) 3. Click on Change 4. Select Old and New Values 5. Check the Output variables are strings button, since Grades is categorical. 6. Choose the appropriate range for the old values, enter new values, and click on the Add button 7. Select Continue to close the Old and New Values dialog box, when you have indicated all the

recode instructions 8. Click on OK to close the Recode into Different Variables dialog box

Figure 1.12: Recoding into Different Variables

14

Figure 1.13: Defining the New Values

1.5.3 Selecting Cases Sometimes you might want to select a subset of cases from your data file for a particular analysis. To select a subset of cases:

1. Click on Data > Select Cases 2. Click on If conditions is satisfied radio button 3. Double click on the names of the variables, complete the condition, and click on Continue 4. Click on OK

Figure 1.14: Selecting cases

15

For practice select only the female students who got a B. When you return to the Data Editor window, you should notice a new column labelled filter_$, and containing 0 and 1 for the unselected and the selected cases respectively. The row number of the unselected cases has also been marked with an oblique bar. See Figures 1.15 and 1.16. Any further analysis of the data will include only the selected cases! To undo the selection click on Data> Select Cases> All Cases> OK

1.5.4 Sorting Data Suppose that we would like to sort the data according to the students’ marks. To do this:

1. Click on Data>Sort Cases 2. Double click on the variable(s) you want to sort by (marks) 3. Select the appropriate Sort order button 4. Click on OK

The cases in the Data Editor are now sorted in ascending/descending order according to marks.

Figure 1.15: Completing the Condition

Figure 1.16: The Data Editor Window with Selected Cases

16

Figure 1.17: Sorting Data

1.6 Drawing a Graph

Figure 1.18: Drawing a Histogram

To draw any graph (histogram, boxplot, pie chart, bargraph etc.):

1. Click on Graph in the menu bar 2. Click on Legacy Dialogs 3. Select the appropriate option.

For example, draw a histogram of Marks:

1. Choose Graph > Legacy Dialogs > Histogram... 2. Double click on the variable marks 3. Choose Titles and type Histogram of Marks

The result will be displayed in the Viewer window (see Figure 1.19).

17

Figure 1.19: Histogram of Marks

To edit a chart, double-click on the chart in the Viewer window. This displays the chart in the Chart Editor window (see Figure 1.20). You can edit the chart from the menus, from the toolbar, or by double-clicking on the object you want to edit.

Figure 1.20: Chart Editor Window

18

Double-clicking an object is an useful shortcut for editing a chart (changing bar colours, titles, etc). If you want to change the colour of the bars, double click on the bars, click the Fill & Border tab and select your colour (you can change the colour of the bars and the border). Selecting the Chart Size tab will allow you to change the size of your histogram whilst selecting the binning tab lets you change where the bars begin and end.

1.7 Computation of Numerical Summaries SPSS offers a wide variety of statistical tools to help you analyze your data, such as descriptive statistics, t-tests, and correlations:

1. Click on Analyze 2. Select the required statistical tool.

For example, look at some descriptive statistics for the marks of the 40 students. From the main menu, first select Analyze > Descriptive Statistics, and then choose one of the sub-menus: Frequencies, Descriptives, or Explore (see Figure 1.21). They contain the procedures for describing and exploring continuous data.

Figure 1.21: Computation of Numerical Summaries

19

1.7.1 Frequencies

1. Click on Analyze > Descriptive Statistics > Frequencies 2. Double - click on the variable you want to analyze (Marks) 3. Select the Statistics... tab 4. Check the measures of central tendency and dispersion you are interested, and click on Continue. 5. Click on OK

Notice that you can also draw a histogram using the Charts tab. The results will be displayed in the Viewer window.

Figure 1.22: The Frequencies Dialog Box

Figure 1.23: The Statistics Dialog Box

20

1.7.2 Descriptives

The descriptive statistics can be also obtained selecting Analyze > Descriptive Statistics > Descriptives > Options. The output is displayed in the Viewer window (see Figure 1.24). The Viewer window is divided into two regions:

1. The Outline pane: contains the table of contents; 2. The Contents pane: contains the tables, charts, text output.

Figure 1.24: The Summary Statistics

The first three components in the Outline pane (Title, Notes, Statistics) have been obtained with the Frequencies procedure. The last three components (Title, Notes, Descriptive Statistics) correspond to the Descriptives procedure. An open book icon next to an item indicates that it is visible in the Contents pane. When you double-click on the book icon, the table or chart is hidden, without being deleted. To change the position of tables or charts in the display, click on the items in the Outline pane, and drag them. Try to switch the position of the two tables in Figure 1.24.

Figure 1.25: The Descriptives Dialog Box

21

1.7.3 Explore

Explore allows quantitative variables to be classified by categories of a qualitative variable. You can also identify outliers and plot side by side boxplots, stem and leaf displays, and histograms.

Figure 1.26: The Explore Dialog Box

Figure 1.27: The Explore>Statistics Dialog Box

To practice, obtain a side-by-side boxplot and the descriptive statistics of the marks of male and female students:

1. Click on Analyze > Descriptive Statistics > Explore... 2. Move the quantitative variable (marks) to the Dependent List and the categorical variable sex to

the Factor List using the arrows 3. Select Both in the Display box 4. Click on Statistics... tab 5. Select Descriptives in the Statistics dialog box and click on Continue 6. Click on Plots... tab 7. Check Factor levels together in the Boxplots box, and click on Continue 8. Click on OK

22

Figure 1.28: The Explore: Plots dialog box

As you can see the marks for the female students are slightly higher than the marks for the male students. There are no outliers. The distribution of the marks of the female students is symmetrical, and the distribution of the marks of the male students is skewed to the left (see Figure 1.29).

Figure 1.30: Side by Side Boxplot

23

1.8 Saving Your Work Each part of the SPSS work you have created will need to be saved or printed as separate files. Saving and printing are selected from the File menu. To ensure you save the correct window, make certain that it is selected first.

1.8.1 Saving Results To save the contents of the Viewer window, select from the Viewer menus File>Save. All Viewer files have the extension .spv, and they can be opened either from the Data Editor or the Viewer window by clicking on File>Open.

1.8.2 Transferring Output into Word

SPSS can be transferred to virtually any word processor, and edited. Transferring the output allows you to add appropriate comments and to properly format your report. After you have saved both the data and the results from SPSS, minimize the SPSS window. Minimizing the window allows SPSS to remain active, so it will be easily accessible if you need it. When SPSS is minimized it appears as a program button on the taskbar. Open Word from the Start menu, and then move back to SPSS by clicking on the SPSS program button from the taskbar. You are now back in SPSS. Select the output you want to transfer by clicking the item in the outline or the contents pane of the Viewer - Click on Edit > Copy or use the ctrl-c keys. Now open Word and move the cursor to where you want the output to be inserted, and click on the paste icon located on the toolbar, select Edit > Paste or use the ctrl-v keys. Once the output has been inserted, you may add the appropriate comments, titles, etc. If you wish to resize a graph, click on the graph. The border will then contain eight small squares on it. To resize, click on one of the squares and drag the frame to its desired size. While editing your document you should save it from time to time. This will ensure that you do not lose your work in the event of a computer failure.

1.9 Printing Your Output Printing your results can be done either directly from SPSS, or from your Word document (as you will do for your assignments). To print the content of the Contents pane in the Viewer window:

1. Select the items you want to print in the outline or the contents pane 2. Select File > Print Preview to preview your printout 3. Select File > Print 4. Check the Selection button from Print range in the Print dialog box 5. Click OK

24

Chapter 2 Parametric Procedures Objective: After studying this chapter you should be able to:

1. Do inference about one population mean using the one sample t procedure; 2. Do inference about two population means using the two sample t procedure; 3. Do inference about two population means using the paired t procedure.

2.1 Inference About One Population Mean Using the One Sample t Procedure 2.1.1 Two sided confidence intervals and hypothesis tests: Assumptions

1. The sample is randomly and independently selected from the population. 2. The sampled population is approximately normal.

The one-sample t-procedure is used.

1. Two sided (1 – α) % Confidence interval for µ:

√ , df = n -1

and

2. Test statistic for testing H0: μ = δ versus Ha: μ ≠ δ

to =

√

, df = n - 1

Example: A math achievement test is given to a random sample of 13 high school girls. The scores are given here: Scores for girls: 87, 91, 78, 81, 72, 95, 89, 93, 83, 74, 75, 85, 95 The data are given in the SPSS data file girlsachieve.sav on Blackboard.

1. Does the mean achievement score for girls differ from 80? Test at a 5% level of significance. 2. Construct a 95% confidence interval for the mean achievement score for girls.

25

Solution: Let µ denote the mean achievement score for girls. Hypothesis Test: Perform a one sample t-test:

Step 1. Hypotheses: H0 : µ = 80 versus Ha : µ ≠ 80, α = 0.05

Step 2. Assume the sample was drawn randomly and independently from the population. Draw a boxplot to check the normality assumption.

(a) Select Graphs>Legacy Dialogs>Boxplot and make sure the Simple box is current. (b) Click Define. In the Define dialog box use the arrows to select GirlScores as Boxes Represent

(c) Click OK

Figure 2.1: Define Dialog Box for Drawing a Simple Boxplot for GirlScores

Open the SPSS Viewer to see the output. The boxplot shows that the distribution is approximately symmetric, with no outliers. The t-test is robust to this situation (Figure 2.2). Note that it is prudent to at least a histogram or a normal probability plot (explained later) in addition to a boxplot when checking the normality assumption.

Figure 2.2: Boxplot of Math Achievement Scores for Females

26

Step 3 - 6: It is necessary to generate computer output to complete the hypothesis test.

Commands for a one sample t-test using SPSS:

a. Select Analyze>Compare Means>One-Sample T Test b. Select GirlScores as the test variable(s) (see Figure 2.3) c. Select 80 as your Test Value d. Click Options, type 95% for Confidence Interval, and click Continue e. Click OK

Figure 2.3: One- Sample T test Dialog Box

One-Sample Statistics

N Mean Std. Deviation Std. Error Mean

GirlScores 13 84.46 8.038 2.229

One-Sample Test

Test Value = 80

t df Sig. (2-tailed) Mean Difference

95% Confidence Interval of the

Difference

Lower Upper

GirlScores 2.001 12 .068 4.462 -.40 9.32

Figure 2.4 One Sample T Test Output to obtain test statistic and P-value for test of H0 : µ = 80 versus Ha:µ ≠ 80

Step 3: From the output in Figure 2.4, the test statistic is to = 2.001 with 12 degrees of freedom. Step 4: From the output in Figure 2.4, the P-value=0.068. Step 5: Do not reject Ho, at the 5% significance level (since 0.068 > 0.05). Step 6: There is no significant evidence the mean of the girl achievement scores differs from 80.

27

Confidence Interval: The result in Figure 2.4 returns a 95% confidence interval for μ – 80. This is not the standard way in which confidence intervals are generally presented, and it is more usual to present a confidence interval for μ. To obtain a confidence interval for μ, it is necessary to rerun the commands above using 0 as the test value. The output returned is in Figure 2.5.

One-Sample Test

Test Value = 0


95% Confidence Interval of the

Difference

Lower Upper

GirlScores 37.888 12 .000 84.462 79.60 89.32

Figure 2.5: One Sample T Output to obtain confidence interval for μ

From the SPSS output, the 95% confidence interval for µ is (79.60, 89.32). We are 95% confident that the mean achievement score for girls falls between (79.60, 89.32). Since 80 is captured within this confidence interval, we cannot exclude 80 as a possible value for the mean. We do not find sufficient evidence that the mean achievement score for girls differs significantly from 80. 2.1.2 One sided confidence bounds and hypothesis tests: Assumptions

1. The sample is randomly and independently selected from the population. 2. The sampled population is approximately normal.

The one-sample t-procedure is used. Right Sided Test:

1. Test statistic for testing H0: μ = δ versus Ha: μ > δ

to =

√

, df = n - 1

2. One sided lower bound with (1 – α) % Confidence

√ , df = n -1

28

Left Sided Test:

1. Test statistic for testing H0: μ = δ versus Ha: μ < δ

to =

√

, df = n - 1

2. One sided upper bound with (1 – α) % Confidence

√ , df = n -1

Example: A math achievement test is given to a random sample of 13 high school girls. The scores are given here: Scores for girls: 87, 91, 78, 81, 72, 95, 89, 93, 83, 74, 75, 85, 95 The data are given in the SPSS data file girlsachieve.sav on Blackboard. 1. Does the mean achievement score for girls exceed 78? Test at a 1% level of significance. 2. Construct a 99% confidence one sided right sided lower bound for the mean achievement score for girls.

Solution: Let µ denote the mean achievement score for girls. Hypothesis Test: Perform a one sided right sided sample t-test:

Step 1: Hypotheses: H0 : µ = 78 versus Ha µ > 78, α = 0.01

Step 2: Assume the sample was drawn randomly and independently from the population. The boxplot created in Figure 2.2 to check the normality assumption shows that the distribution is approximately symmetric, with no outliers. The t-test is robust to this situation.

Step 3 - 6: To generate computer output for the hypothesis test, repeat the command steps shown above using 78 as a Test Value (see Figure 2.6) and 98 as a confidence level (discussed below). SPSS only provides the P-value for a two sided test, so it is necessary to convert the P-value in the output to one for a one-sided test. For this example, the obtained P-value will be divided by 2. (Details on how to handle all possible cases is provided in Table 2.8) Note: A 98% two sided confidence interval for μ – 78 will also be returned, and this will not be of interest. However, when a rerun of commands is done with 0 as a Test Value, the 98% confidence interval for μ will be of interest. This is because the 99% confidence one sided right sided lower bound is the same as the lower bound of a 98% two sided confidence interval. Figure 2.7 provides these results. (Details on how to handle all possible situations is provided in Table 2.9)

29

Commands:

a. Select Analyze>Compare Means>One-Sample T Test b. Select GirlScores as the test variable(s) c. Select 78 as your Test Value d. Click Options, type 98% for Confidence Interval, and click Continue

e. Click OK

One-Sample Test

Test Value = 78


98% Confidence Interval of the Difference

Lower Upper

GirlScores 2.899 12 .013 6.462 .49 12.44

Figure 2.6 One Sample T Test Output to obtain test statistic and P-value for test of H0 : µ = 78 versus Ha:µ > 78

Step 3: The test statistic is to = 2.899 with 12 degrees of freedom. Step 4: The P-value = 0.013/2 = 0.0065 (since the P-value in Figure 2.6 is for a 2 sided test) Step 5: Reject Ho, at the 1% significance level (since 0.0065 < 0.01). Step 6: There is significant evidence the mean of the girl achievement scores exceeds 78.

Confidence Bound:

A (1-α)% confidence one sided right sided lower bound is -

√ . This is the same as the lower bound

for a (1-2α)% two sided confidence interval of ( -

√ , +

√ ).

For α = 0.01, a (1 - .01)% =99% confidence one sided right sided lower bound is -

√ . This is the

same as the lower bound for a (1-2(.01))% = (1-.02)% = 98% two sided confidence interval of

( -

√ , +

√ ).

Therefore, rerun the commands above using 0 as the test value and 98 as the confidence level. The output returned is in Figure 2.7.

One-Sample Test

Test Value = 0

t df Sig. (2-tailed) Mean

Difference


Lower Upper

GirlScores 37.888 12 .000 84.462 78.49 90.44

Figure 2.7: One Sample T Output to obtain lower confidence bound for μ

30

From Figure 2.7, the 99% confidence one sided right sided lower bound for µ is 78.49. We are 99% confident that the mean achievement score for girls falls above 78.49. Since 78.49 is above 78, we can exclude 78 as a possible value for the mean. We find sufficient evidence that the mean achievement score for girls exceeds 78. SPSS only provides the P-value for a two sided test, so it is necessary to convert the P-value obtained in the output to a P-value for a one-sided test. Details on how to handle all possible cases is provided in Table 2.8.

P-value Calculation for Two Sided P-value P-value Calculation for One Sided P-value

Ha: µ ≠ µo t* positive TSP-value = 2P(t>t*)

Ha:µ > µo t* positive P(t>t*) = OSP-value =TSP-value/2

Ha: µ≠ µo t* negative TSP-value = 2P(t<t*)

Ha:µ > µo t* negative P(t>t*) = OSP-VALUE = 1 – P(t<t*) = 1- (TSP-value/2)

UNLIKELY SCENARIO: WOULD HAVE A LESS THAN µo WHICH IS NOT AS EXPECTED WHEN SET UP TEST

Ha: µ≠ µo t* negative TSP-value = 2P(t<t*)

Ha:µ < µo t* negative P(t<t*) = OSP-value =TSP-value/2

Ha: µ≠ µo t* positive TSP-value = 2P(t>t*)

Ha:µ<µo t* positive P(t<t*) = OSP-VALUE = 1 – P(t>t*) = 1 - (TSP-value/2)

UNLIKELY SCENARIO: WOULD HAVE A GREATER THAN µo WHICH IS NOT AS EXPECTED WHEN SET UP TEST

Figure 2.8: P-value Adjustment Table Finding a (1-α)% one sided confidence bound using SPSS requires that the confidence level used to perform the test be (1 - 2α)% . Details are included in Table 2.9.

Right Sided Left Sided

A (1-α)% confidence one sided right sided lower

bound is -

√ .

A (1-α)% confidence one sided left sided upper

bound is +

√ .

Same as: lower bound for a (1-2α)% two sided

confidence interval of ( -

√ , +

√ ).

Same as: upper bound for a (1-2α)% two sided

confidence interval of ( -

√ , +

√ ).

Figure 2.9: Adjustment Table for One Sided Confidence Intervals Example: A math achievement test is given to a random sample of 13 high school girls. The scores are given here: Scores for girls: 87, 91, 78, 81, 72, 95, 89, 93, 83, 74, 75, 85, 95 The data are given in the SPSS data file girlsachieve.sav on Blackboard.

1. Is the mean achievement score for girls below 87? Test at a 2% level of significance. 2. Construct a 98% confidence one sided left sided upper bound for the mean achievement score for girls.

31

Solution: Let µ denote the mean achievement score for girls. Hypothesis Test: Perform a one sided left sided sample t-test:

Step 1: Hypotheses: H0 : µ = 87 versus Ha µ < 87, α = 0.02

Step 2: Assume the sample was drawn randomly and independently from the population. The boxplot created in Figure 2.2 to check the normality assumption shows that the distribution is approximately symmetric, with no outliers. The t-test is robust to this situation.

Step 3 - 6: To generate computer output for the hypothesis test, repeat the command steps shown above using 87 as a Test Value (see Figure 2.11) and 96 as a confidence level. Note: A 96% two sided confidence interval for μ – 87 will also be returned, and this will not be of interest. However, when a rerun of commands is done with 0 as a Test Value, the 96% confidence interval for μ will be of interest. This is because the 98% confidence one sided left sided upper confidence bound is the same as the upper bound of a 96% two sided confidence interval. Figure 2.12 provides these results.

1. Conduct a one sample t-test using SPSS: a. Select Analyze>Compare Means>One-Sample T Test b. Select GirlScores as the test variable(s) c. Select 87 as your Test Value d. Click Options, type 96% for Confidence Interval, and click Continue e. Click OK

One-Sample Test

Test Value = 87



Lower Upper

GirlScores -1.139 12 .277 -2.538 -7.67 2.59

Figure 2.11 One Sample T Test Output to obtain test statistic and P-value for test of H0 : µ = 87 versus Ha:µ < 87

Step 3: The test statistic is to = -1.139 with 12 degrees of freedom. Step 4: The P-value = 0.277/2 = 0.138 (since the P-value in Figure 2.11 is for a 2 sided test) Step 5: Do not reject Ho, at the 2% significance level (since 0.138 > 0.02). Step 6: There is no significant evidence the mean of the girl achievement scores lies below 87.

Confidence Interval:

A (1-α)% confidence one sided left sided upper bound is +

√ . This is the same as the upper bound

for a (1-2α)% two sided confidence interval of ( -

√ , +

√ ).

32

For α = 0.02, a (1 - .02)% =98% one sided left sided confidence interval has an upper bound of +

√ .

This is the same as the upper bound for a (1-2(.02))% = (1-.04)% = 96% two sided confidence interval of

( -

√ , +

√ ).

Therefore, rerun the commands above using 0 as the test value and 96 as the confidence level. The output returned is in Figure 2.12.

One-Sample Test

Test Value = 0



Lower Upper

GirlScores 37.888 12 .000 84.462 79.33 89.59

Figure 2.12: One Sample T Output to obtain upper confidence bound for μ

From Figure 2.12, the 98% confidence upper bound for µ is 89.59. We are 98% confident that the mean achievement score for girls falls below 89.59. Since 89.59 is not below 87, we cannot exclude 87 as a possible value for the mean. We do not have sufficient evidence that the mean achievement score for girls lies below 87. (The upper bound is not below 87).

2.2 Inference About Two Population Means Using Two Sample t Procedure Confidence intervals and hypothesis tests Assumptions:

1. The samples are randomly and independently selected from the populations. 2. Both sampled populations are approximately normal.

If the population variances are equal, the two-sample pooled t-procedure can be used: 1. Test statistic for testing H0: µ1 - µ2 = δ versus Ha: µ1 - µ2 = δ or Ha: µ1 - µ2 > δ or Ha: µ1 - µ2 < δ :

to =

√(

), df = n1 + n2 – 2

sp = pooled estimate for σ = √

.

2. (1 – α) % Two sided Confidence interval for µ1 - µ2:

√(

) , df and sp as above

33

3. (1-α) % Lower bound for a one sided right sided confidence bound for µ1 - µ2

√(


4. (1-α) % Upper bound for a one sided left sided confidence bound for µ1 - µ2

√(


If the population variances are not equal we can use the two sample non-pooled Welch t-test: 1. Test statistic for testing H0: µ1 - µ2 = δ versus Ha: µ1 - µ2 = δ or Ha: µ1 - µ2 > δ or Ha: µ1 - µ2 < δ

to = ( )

√(

)

where df = (

)

or has the approximation df = min(n1 – 1, n2 – 1)

2. (1 – α) % Two Sided Confidence interval for µ1 - µ2

√(

) , df as above

3. (1-α) % Lower bound for a right sided confidence interval for µ1 - µ2

√(

) , df as above

4. (1-α) % Upper bound for a left sided confidence interval for µ1 - µ2

√(

) , df as above

Example: A math achievement test is given to a random sample of 25 high school students. The scores are given here: Scores for girls: 87, 91, 78, 81, 72, 95, 89, 93, 83, 74, 75, 85, 95 Scores for boys: 68, 87, 67, 74, 81, 93, 60, 78, 74, 92, 81, 62 The data are given in the SPSS data file achieve.sav on Blackboard.

34

1. Is there a significant difference between the mean scores for boys and girls? Test at 5% level of

significance. 2. Construct a 95% confidence interval for the difference in the mean scores between boys and girls.

Solution: Let µ1 denote the mean score for girls, and µ2 denote the mean score for boys. Hypothesis Test: Perform the following two sample t-test:

Step 1. Hypotheses: H0 : µ1 - µ2 = 0 versus Ha : µ1 - µ2 ≠ 0 (Also can state as H0 : µ1 = µ2 versus Ha : µ1 ≠ µ2 ) Step 2. We assume that the samples are randomly and independently selected from the populations. Draw boxplots to check the normality assumptions of the test:

(a) Select Graphs>Legacy Dialogs>Boxplot and check the Summaries for groups of cases button

(b) Click Define. In the Define dialog box use the arrows to select Scores as Variable and Sex for the Category Axis (see Figure 2.13)

(c) Click OK

Figure 2.13: Define Dialog Box for Drawing Boxplots of Score by Sex

Open the SPSS Viewer to see the output.

Figure 2.14: Boxplot of Math Achievement Scores for Males and Females

35

The side-by-side boxplot shows that the distributions are approximately symmetric, with no outliers, and the spread, i.e. the variances, are almost the same. The boxplot also shows that overall the girls scored higher than the boys (see Figure 2.14). Further exploration of the data with histograms and/or probability plots would give a better idea of the shape of the sample data and the population data. Steps 3 - 6: To generate computer output for the hypothesis test, perform the steps below. Conduct a two sample t-test using SPSS:

1. Select Analyze>Compare Means>Independent-samples t test 2. Select Scores as the test variable(s) and Sex as the grouping variable (see Figure 2.15) 3. Click Define Groups, type F for Group 1 and M for Group 2, and click Continue (see Figure

2.16) 4. Click Options, type 95% for Confidence Interval, and click Continue 5. Click OK

Figure 2.15: Independent Sample T test Dialog Box

Figure 2.16: Define Groups Dialog Box

Results for both the pooled (for equal variances) and the Welch t-tests (for unequal variances) can be found in the SPSS output. From the output in Figure 2.17, we see that the standard deviations are relatively close in value (8.038 for the girls scores and 10.967 for the boys scores), and that the P-value=0.278 for the Levene's test testing for the equality of variances. Since this P-value is greater than 0.05, the data do not provide sufficient evidence that the variances are different. So we might assume equal variances, and apply the pooled t-test. Steps 3 to 6 for the pooled t-test follow. Step 3: The test statistic is to = 2.104 with 12 degrees of freedom. Step 4: The P-value = 0.047

36

Step 5: Reject Ho, at the 5% significance level (since 0.047 <= 0.05). Step 6: There is significant evidence the mean of the girl achievement scores lies below 87. But be warned that this result is only true if the variances are equal! Is this really a reasonable assumption? It is usually safer to use Welch's T-test unless there are theoretical reasons to assume equal variances. Steps 3 to 6 for Welch’s T-Test follow. Step 3: The test statistic is to = 2.078 with 20.086 degrees of freedom. Step 4: The P-value = 0.051 Step 5: Do not reject Ho, at the 5% significance level (since 0.051 > 0.047). Step 6: There is no significant evidence the mean score of boys differs from the mean score of girls. below 87. For our example, the Welch’s t-test gives a different result than the t-test that assumes equal variances. However, as always, note that the P-value is quite close to the level of significance in both cases. It is always important to state and consider a P-value when any conclusion about significance is stated. Confidence Interval: For the equal variances test, the 95% confidence interval for µ1 - µ2 is (0.135, 15.954). We are 95% confident that the difference in mean score between girls and boys falls between (0.135, 15.954). Since 0 is not captured within this confidence interval, we can exclude 0 as a possible value for the difference in the means. Therefore, we find a significant difference in the mean score between boys and girls. In other words, the data provides sufficient evidence that the mean scores for boys and girls are different. For the Welch’s Test, the 95% confidence interval for µ1 - µ2 is (-0.030, 16.119). We are 95% confident that the difference in mean score between girls and boys falls between (-0.030, 16.119). Since 0 is captured within this confidence interval, we cannot exclude 0 as a possible value for the difference. Therefore, we do not find a significant difference in the mean score between boys and girls. In other words, the data do not provide sufficient evidence that the mean scores for boys and girls are different. Like the hypothesis tests, the confidence intervals provide different answers in terms of the significance of the result. In this case, a close look at the confidence intervals is in order in order to determine how near 0 is to the confidence interval boundary.

37

Figure 2.17 SPSS Output: Two Independent Sample T-test and Confidence Interval

Note, again, that should you need to do a one sided hypothesis test, it would be necessary to convert the P-value obtained in the SPSS output to a P-value for a one-sided test. Furthermore, if you wish to calculate the upper bound for a (1 – α)% one sided left sided confidence interval, or the lower bound for a (1 – α)% one sided right sided confidence interval, you would need to run the SPSS commands with a (1-2α)% as your confidence level, in order to get the bounds you need to show up in the SPSS output.

2.3 Inferences about Two Population Means Using the Paired t Procedure Confidence intervals and hypothesis tests: Assumptions:

1. The population of differences is normally distributed. 2. The sample of differences represents a random sample from the population of differences.

Paired-t Procedure:

Create a sample of differences ( - ) and then apply the one-sample t-procedure for the parameter µd = µ1 - µ2. Example: A private agency is investigating a new procedure to teach typing.. The following table gives the scores of eight students before and after they attended this course. The data are given in the SPSS data file typing.sav found on Blackboard.

Students 1 2 3 4 5 6 7 8

Before 81 75 89 91 65 70 90 69

After 97 72 93 110 78 69 115 75

38

1. Find a two sided 90% confidence interval for the mean difference (of “after – before” pairs) in the writing speed of all students participating in the course.

2. At the 10% level of significance, do the data provide evidence of a significant mean difference in paired “after – before” typing speeds?

Solution: Prior to creating the confidence interval and doing the hypothesis test, assumptions must be checked. For this problem we will assume that the sample of differences constitutes a random sample from the population of differences, and proceed to check the assumption of normality of the differences. In the SPSS data editor window, define the numerical variables, before and after. Next, type the score data before the course in the variable before, and the score data after the course in the variable after. We need to define a new variable, diff, containing the differences in the scores after and before the crash course:

1. Select Transform>Compute 2. Type diff in the Target Variable box 3. Type after - before in the Numeric Expression box and click OK (see Figure 2.18).

Figure 2.18: Defining a New Variable for the Differences

It is preferable to do at least two of a normal probability plot, a boxplot, and a histogram to help analyze if the population of differences is normally distributed. A normal probability plot or a histogram is preferable to a boxplot. If the sample size is small, the normal probability plot is sometimes considered the most useful of the three. How to do the normal probability plot in SPSS is illustrated in below. The plot appears in Figure 2.19.

1. Select Analyze>Descriptive Statistics>Q-Q Plots 2. Select diff as Variables 3. The test distribution dropdown should show the “Normal” default 4. Click OK

39

Figure 2.19: Q-Q Plot of the Difference: After - Before in the Typing Scores

The normal probability plot compares the actual observed values against the expected normal values for a normally distributed random variable. If the distribution from which one samples is normal, then the points will fall close to the lines. Here most points fall close to the line, so we expect that the assumption of a normal population for the differences is not strongly violated. However, there are only 8 values in our data set of differences, which really is very small, and this should be kept in mind. To obtain the confidence interval (and also the hypothesis test) requested, perform the following commands.

1. Choose Analyze> Compare Means> Paired-samples T test 2. Click after, then click before to select after-before as Paired Variables (see Figure 2.20) 3. Click Options, type 90% for Confidence Interval, and click Continue 4. Click OK.

Figure 2.20: Dialog Box for Paired-t Procedure

Paired Samples Statistics

Mean N Std. Deviation Std. Error Mean

Pair 1 After 88.6250 8 17.73566 6.27050

Before 78.7500 8 10.43004 3.68758

40

Paired Samples Correlations

N Correlation Sig.

Pair 1 After & Before 8 .877 .004

Paired Samples Test

Paired Differences

t df Sig. (2-

tailed) Mean Std. Deviation

Std. Error Mean


Lower Upper

Pair 1 After-Before

9.87500 9.94898 3.51749 3.21083 16.53917 2.807 7 .026

Figure 2.21: SPSS Output for the Paired-t Procedure Confidence Interval: In Figure 2.21, a 90% confidence interval for the mean difference µafter - µbefore is given as 3.21 to 16.54 minutes. Since the interval does not cover 0, there is significant evidence that the mean difference in the “after – before” pairs differs from 0. The mean difference in minutes for the “after – before” pairs ranges from 3.21 to 16.54 words a minute. Hypothesis Test: Step 1: Hypotheses: H0 : µafter = µbefore versus Ha : µafter ≠ µbefore

(Also can state as H0 : µd = 0 versus Ha : µd ≠ 0) Step 2: We assumed a random sample of paired differences, and found, from examining a probability

plot of the differences (Figure 2.19), that the sample of differences seemed to indicate that normality of the population of differences was not untoward.

For Steps 3 to 6, SPSS output (Figure 2.21) from executing the paired-t procedure commands is needed. Step 3: Test statistic: t = 2.807 with 7 ( = # pairs – 1) degrees of freedom Step 4: P-value= 0.026 Step 5: Reject Ho, at the 10% level of significance, since .026 < .10. Step 6: The data provides sufficient evidence that the mean difference in minutes for the “before –

after” pairs differs from 0. Note, again, that should you need to do a one sided hypothesis test, it would be necessary to convert the P-value obtained in the SPSS output to a P-value for a one-sided test. Furthermore, if you wish to calculate the upper bound for a (1 – α)% one sided left sided confidence interval, or the lower bound for a (1 – α)% one sided right sided confidence interval, you would need to run the SPSS commands with a (1-2α)% as your confidence level, in order to get the bounds you need to show up in the SPSS output.

41

Chapter 3 ANOVA Objectives: After studying this chapter you should be able to

1. Compare two or more population means using the one-way ANOVA procedure; 2. Conduct a multiple comparison of means; 3. Make inferences about linear combinations of group means; 4. Run and interpret randomized block designs; and 5. Fit a 2-way ANOVA and interpret the results.

3.1 The Analysis of Variance F-test for Equality of k Population Means In a single factor experiment, measurements are made on a dependent response variable of interest for several levels (treatments) of a factor of interest. ANOVA is a method of parametric inference that allows us to compare the k populations of interest from which samples have been obtained or allocated. Assumptions:

1. The samples are randomly and independently selected from the populations. 2. The sampled populations are approximately normal. 3. The population variances are equal.

Model:

Xij = µ + αi + εij i = 1, … k, j = 1, … nj

Xij is a random dependent variable denoting the jth measurement on treatment i k is the number of treatments nj is the number of sample units per treatment, n = ∑

αi = μ – μi is the mean difference in treatment for treatment i μi is the mean of the ith treatment εij is the error in measurement, which can be explained through random effects not included in the model. We can write that we assume εij ~ N(0, σ2).

If all nj are equal, we say the model is balanced. Otherwise, the model is unbalanced. Null hypothesis: H0 : µ1 = µ 2 = ... = µ k , where k is the number of treatments Alternative hypothesis: Ha : At least two means differ or at least one of the means is different. – sample mean of ith treatment si – sample standard deviation of ith treatment – overall mean of sample observations

42

Test statistic: F=MSTr/MSE ,

MSTr =

is the Mean Treatment Sum of Squares, where SSTr, the Treatment Sum of Squares,

measures the variation among the sample means. MSE= SSE/n-k is the Mean Error Sum of Squares, where SSE, the Error Sum of Squares, measures the variation within the samples. Under the null hypothesis the test statistic has an F distribution With ν1= treatment degrees of freedom =k - 1, and ν2=error degrees of freedom =n – k. Example: Suppose that 44 students were selected and randomly and independently assigned to four groups of 11 students each. Students in each group were taught statistics using a different method of programmed learning. A standard test was administered to the four groups and scored on a 15-point scale. The data are given in the SPSS data file program.sav found on Blackboard. Method 1: 3, 5, 6, 8, 4, 3, 5, 6, 4, 6, 3 (lecture only) Method 2: 5, 7, 7, 7, 6, 6, 8, 4, 6, 7, 5 (lecture and computer labs) Method 3: 7, 5, 6, 8, 7, 6, 9, 8, 7, 7, 8 (lecture and computer labs and assignments) Method 4: 4, 6, 6, 7, 6, 5, 5, 5, 6, 5, 4 (lecture and assignments)

1. Do the assumptions for a one-way analysis of variance seem to be met? 2. Determine whether there is a significant difference in the mean scores for the four methods.

Solution: The measurements in program.sav are organized in 4 different columns (one for each treatment).SPSS requires the data to be organized a different way in order to perform the one-way analysis of variance F-test. It requires only two variables, one for all the scores (i.e. measurements) and the other for the index. Here is one way to create a new file with the data reorganized in order that the ANOVA test can be done. (Always check your data to see if this is necessary):

Figure 3.1: Restructure Data Wizard

43

1. Select Data>Restructure to open the Restructure data wizard (see Figure 3.1) 2. Check the Restructure selected variables into cases button and click Next 3. Check the One button for the number of variable groups and click Next 4. Select all variables for the four methods in the Variables to be transposed dialog

box using the arrow, type Scores as the Target variable, select None as Case Group Identification, and click Next (see Figure 3.2)

5. Check the One button for the number of index variables and click Next 6. Check Sequential numbers for the index values and click Finish

Figure 3.2: Variable to Cases – Select Variables Dialog Box

SPSS will create a new data file containing two variables: index1 (with the values 1 for method 1, 2 for method 2, 3 for method 3, and 4 for method 4), and scores with all the 44 scores. Save this file as programrestructured.sav and use it to solve the problem. 1. CHECK ASSUMPTIONS We are told in the question that students were assigned randomly and independently to the treatment groups. The following commands will be used to check the normality and equal variances assumptions.

(a) Using Analyze>Descriptive Statistics>Explore (b) Dependent List: Scores (c) Factor List: Index1 (d) Statistics tab: Check Descriptives (e) Continue (f) Plots tab: In Boxplots, Choose Factor Levels together (g) Check Histogram (h) Check Normality Plots with test (i) Continue (j) Ok

44

Treatment Mean Standard Deviation

Method 1 4.82 1.601

Method 2 6.18 1.168

Method 3 7.09 1.136

Method 4 5.36 0.924

45

Figure 3.3: Descriptives, Histograms, Probability Plots, and Boxplots for the 4 Different Scoring Methods

Note that with the very small sample sizes, none of the plots obtained are particularly informative (Figure 3.3). There is some indication that the normality assumption might be violated. The samples are not symmetric. However, since the sample sizes are equal, and the distributions are not extremely skewed or long-tailed, and there are no glaring outliers, we can still use the ANOVA F-test as it is robust to this situation. To check the equality of the population variances, the following rule of thumb may be used. If the ratio of the largest sample standard deviation to the smallest sample standard deviation is less than 2, the assumption of equal standard deviations is plausible. In this case the ratio = s1/s4 = 1.73 < 2. Thus, the conditions for one-way ANOVA seem to be met.

2. Perform the ANOVA F-test:

SPSS Commands and Output for a One-way ANOVA test

i. Select Analyze>Compare Means>One-way ANOVA... ii. Select scores as Dependent list(s) and index1 as Factor (see Figure 3.4) iii. Click Options and check Descriptive, Homogeneity of variance test, and Means plot buttons, and click Continue (see Figure 3.5)

iv. Click Ok

Figure 3.4: Dialog Box for the One-Way ANOVA Procedure

46

Figure 3.5: Options Dialog box for the One-Way ANOVA Procedure

The descriptive option provides standard deviations to help check for equal variances. In addition, the standard deviations and means are needed to perform multiple comparisons, if appropriate. The homogeneity of variances check performs a test for equal variances called the Levene test (see Figure 3.6). The means plot provides a graphical look at how far apart the means are (see Figure 3.7)

Figure 3.6 SPSS Output for a One-Way ANOVA

47

Figure 3.7: Means Plot

Levene Test: P-value=0.298. There is certainly no strong evidence against the assumption that the population variances are equal. This supports the result found with the rule of thumb approach. Descriptive Statistics Table: 4.82, 6.18, 7.09, 5.36. Note that the 95% confidence interval for µ1 and µ3 do not overlap, and this would indicate to us students in Method 3 (lecture and assignments and computer lab) classes obtained, on average, superior scores to students in Method 1 (lecture) classes. We will look at a more statistically apt way of making multiple comparisons of the treatment means later on. Means Plot: also illustrates the differences among the means Proper Write Up on Hypothesis Test, using information from above. Step 1: Hypotheses: H0 :µ1 = µ 2 = ... = µ4 versus Ha : At least two means are different. We decide to pre-chose a level of significance of α = 0.01 Step 2: Assumptions: These were checked above. This data appears such that the ANOVA test can be used.

Step 3 and Step 4 (test statistic and P-value) (Commands and Output appear below)

The F-statistic=7.126 with degrees of freedom ν1 = 3, ν2 = 40 The P-value=0.001 Step 5: Decision Reject Ho since P-value < α (that is: 0.001 < 0.01)

Step 6: Interpretation This indicates that there is strong evidence to suggest that not all mean scores of the four methods are equal. (Further investigation will be undertaken later in this chapter.)

48

If all residuals come from a normal distribution (that is, if the scores are normally distributed), then it is expected that a normal probability plot of the residuals should have points close to the line. The following commands present an alternative way to obtain ONE WAY ANOVA results (the alternative format in the output is not of interest here) while having SPSS store the unstandardized predicted values and the unstandardized residuals in the original data file (these are of interest). Once we have the columns in the data file, a normal probability plot can be made with the residuals. (see Figure 3.8).

(a) Analyze > General Linear Model > Univariate (b) Dependent Variable: Scores (c) Fixed Factor(s): Index1 (d) Save: Predicted Values: Unstandardized (e) Save: Residuals: Unstandardized (f) Continue (g) OK

(a) Analyze > Descriptive Statistics > Q-Q Plots (b) Variables: RES_1 (Residuals for Score) (c) Test Distribution: Normal (d) OK

We see here that most of the residuals are close to the line. We do identify a couple that are outlying at each end. And examination of the RES_1 column in the data file reveals these to have values to be observation 13 in group 1 with value 8, and a residual of 3.18 and observation 30 in group 3 with value 6.18 and residual -2.18. Both of these values can be readily seen on the histograms above. They are not so far away that we are worried. However, we remain aware of our small sample sizes.

Figure 3.8: Normal Probability Plot of Residuals

49

3.2 Linear Combinations and Multiple Comparisons of Means Contrasts: Linear combinations of the group means have the form + If the coefficients add to zero (C1 + .... + Ck = 0) then the linear combination is called a contrast. Some important concepts to note:

1. (1 – α)% Confidence interval for :

√(

)

where = √ is the pooled estimate for σ , is critical value of the t distribution with

d.f. = n1 + ... + nk – k.

2. Test statistic for testing H0 : γ = δ:

to =

√(

)

Example: Suppose we are interested in whether the method that involves all 3 components is clearly superior. We compare it with the average of the other three methods.

We define the contrast γ = µ3 - .33µ2 - .33µ1 - .33µ4:

The coefficients for this contrast are C1 = -.33; C2 = -.33 C3 = 1 C4= -.33 To test if the extra learning components help: (a) Select Analyze>Compare Mean>One-way ANOVA... (b) Select score as Dependent list(s) and method as Factor (c) Click Contrasts (d) Check Polynomial button and choose Linear for Degree (e) Type the coefficients one by one in the Coefficients box, click Add every time, and finally click

Continue (see Figure 3.9) (f) Click OK

50

Figure 3.9: SPSS Dialog Box for Contrasts

The SPSS output is given in Figure 3.10. The estimated value of the contrast is 1.69 with to = 3.952, df=40 and the P-value very small (≈0.000) (when assuming equal variances). Using the information in the table we can also find a 95% confidence interval for γ:

1.69 ± (2.021) √ √

=1.69 ± .8648

(select Transform>Compute and then choose the SPSS function IDF.T(0.975,40) to return 2.021). Since the 95% confidence interval contains only positive numbers (i.e. adding learning components to the lecture increases the mean results) and the P-value is less than 0.05, we conclude that the extra learning components increase the mean score on the exam.

A 6 step one sided right sided hypothesis test could also be performed as follows. We’ll assume a prechosen α of 0.01. Step 1: Ho: µ3 - .33µ2 - .33µ1 - .33µ4 <=0 versus Ha: µ3 - .33µ2 - .33µ1 - .33µ4 > 0 Step 2: Assumptions are the standard ANOVA ones, and we have already discussed them for this data, and determined that we can proceed with testing. Step 3: to = 3.952 with 40 degrees of freedom. Step 4: p-value = P(t >3.952) ≈ .000/2 ≈ 0 Step 5: Since the p-value <= 0.01, we reject Ho Step 6: We have evidence that the extra learning components increase the mean score on the exam.

Contrast Coefficients

Contrast

Index1

1 2 3 4

1 -.33 -.33 1 -.33

Contrast Tests

Contrast

Value of Contrast Std. Error t df Sig. (2-tailed)

Scores Assume equal variances 1 1.69a .428 3.952 40 .000

Does not assume equal variances 1 1.69a .406 4.167 18.490 .001

a. The sum of the contrast coefficients is not zero.

Figure 3.10: Contrast: check for superiority of the lecture, computer lab and assignment method

51

Multiple Comparisons: Sometimes we are interested in more than one contrast In the situation where an ANOVA test has an overall significant finding for the model that that tells us that at least one of the means is different from the others, then the creation of individual confidence intervals for all pairwise comparisons is of interest. Example: Check the significance level of the model for the test above. It appropriate, create pairwise comparison confidence intervals for all possible pairs, so that each confidence interval of interest has an individual level of confidence of 95%. This approach is known as the Fisher LSD approach. Solution: Since our F test is significant for our test of overall model significance, it is appropriate to proceed with the calculation of the Fisher LSD intervals. Output is shown in Figure 3.11. Computer Commands and Output

(a) Select Analyze>Compare Means>One-Way ANOVA... Dependent List: Scores and Factor: Index1

(b) Click Post Hoc (d) Choose to use LSD under Equal Variances Assume on the Post Hoc Multiple Comparisons screen (e) Set the significance level to 0.05 (f) Click Continue and then OK

Multiple Comparisons

Scores

LSD

(I) Index1 (J) Index1

Mean

Difference (I-J) Std. Error Sig.

95% Confidence Interval

Lower Bound Upper Bound

1 2 -1.364* .525 .013 -2.43 -.30

3 -2.273* .525 .000 -3.33 -1.21

4 -.545 .525 .305 -1.61 .52

2 1 1.364* .525 .013 .30 2.43

3 -.909 .525 .091 -1.97 .15

4 .818 .525 .127 -.24 1.88

3 1 2.273* .525 .000 1.21 3.33

2 .909 .525 .091 -.15 1.97

4 1.727* .525 .002 .67 2.79

4 1 .545 .525 .305 -.52 1.61

2 -.818 .525 .127 -1.88 .24

3 -1.727* .525 .002 -2.79 -.67

*. The mean difference is significant at the 0.05 level.

Figure 3.11: Multiple Comparison output for LSD Approach

52

The table gives the p-value of the specific test and the 95% confidence interval for each pairwise comparison. Recall that the confidence intervals are not considered significant when they contain 0 (or in our case when the P-value> 0.05). We see that the differences between μ1 and μ2 , μ 1 and μ 3, and μ3

and μ4 present as significant. The other 3 pairwise intervals do not present as significant. Each of these individual pairwise comparison intervals has 95% confidence. In general, the Individual confidence level is the confidence we have that any particular confidence interval of interest contains the difference between the corresponding population means. And, if several confidence intervals are of interest, the family wise (also called experiment wise or simultaneous or overall) confidence level is the confidence we have that all the confidence intervals of simultaneous interest contain the differences between the corresponding population means. The more comparisons we do, the more likely it becomes that we make a wrong decision (that is, that we have at least one wrong conclusion in our results).

*Suppose we made three comparisons for three independent tests. Then the joint probability of not making a type 1 error on all three tests is (.95)3 = .8574. And the probability of making at least one type 1 error is then 1 - .8574 = .1426. Now, in our case, our tests are not independent because MSE is used in each test, and we have the same xis in various tests. It can be shown (this is beyond the scope of our course) that the error involved is even greater in this case.

The decision we must make is whether to control for the family wise error rate or for the individual error rates. Several strategies exist to handle this situation. Remember, though, that multiple comparisons should only be made if they are of interest (i.e. we would not make multiple comparisons if the overall F model was not significant). In order to do so, an acceptable individual error rate or an acceptable experiment wise (family wise) error rate, α, must be decided. Fisher’s LSD: This test is used for pairwise comparisons only. Individual t tests, each at some chosen level, α, are performed. The level of overall significance for all tests performed will be (often considerably) larger than the individual α. Tukey-Kramer: This test is specifically for pairwise comparisons in an ANOVA setting. It uses a formula that is based on the q distribution (the studentized range distribution), a special type of right-skewed curve. The formula for the Tukey-Kramer approach is

-

√ √ √

, where =the q value from a q distribution with area α to its right*

*For a balanced design, all these intervals have the same width, and are known as the Tukey HSD intervals. With these equal sample sizes, the level of family wise confidence for the Tukey interval is (1 – α). For an unbalanced design, the Tukey-Kramer intervals do not have the same width. With unequal sample size, the level of family wise confidence is “conservative”, and higher than (1 – α). Bonferroni: This multiple comparison method can be used to test for all possible contrasts of interest in very general situations when we want to control family wise error rate. Contrasts of interest are decided ahead of time. It uses individual significance levels of α* = α/g, where g is the number of contrasts of interest and α is the overall error rate.

53

When used with pairwise comparisons, Bonferroni intervals have the following formula:

- √ √

, where

and m is the number of pairwise comparisons

For a balanced design, all Bonferroni intervals will have the same width. In this case, the Tukey HSD approach results in narrower intervals than the Bonferroni approach. The Bonferroni approach is more conservative than the Tukey-Kramer approach. Scheffe: This multiple comparison method can also be used to test for all possible contrasts of interest for both equal and unequal sample sizes. It offers an overall α level of protection, regardless of how many contrasts are tested. It is best used post-hoc rather than planned. Scheffe intervals are wider than Tukey intervals when used for pairwise comparisons. Question: For our example above, produce LSD, Bonferroni, Tukey and Sheffe comparison intervals for our example (see Figure 3.12). Tell SPSS to use an α = .05 Computer Commands and Output

(a) Select Analyze>Compare Means>One-Way ANOVA... Dependent List: Scores and Factor: Index1

(b) Click Post Hoc (d) Choose to use LSD, Bonferroni, Tukey, and Scheffe under Equal Variances Assume on the Post Hoc Multiple Comparisons screen (e) Set the significance level to 0.05** (f) Click Continue and then OK

**Be careful here. For LSD, SPSS will use an individual error rate of 0.05 for each interval. For the Bonferroni, Tukey, and Scheffe intervals, it will use a family wise error rate of 0.05. The table gives the p-value of the specific test and the 95% confidence interval for all methods. Recall that the confidence intervals are not considered significant when they contain 0 (or in our case when the P-value> 0.05). The LSD approach gives the narrowest confidence intervals, but remember, these intervals are intervals where the deciding factor is the individual error rate of 0.05. This methods indicates that learning method 3 (lecture, computer lab and assignments) is superior to method 1(lecture only) and to method 4 (lecture and assignment). It also indicates that learning method 2(lecture and computer lab) is superior to learning method 1(lecture only). The other three methods are using a family wise error rate of 0.05. The Tukey intervals are smallest, the Bonferroni the next smallest, and the Scheffe the largest. The Tukey, Bonferroni and Scheffe methods all indicate that Learning Method 3 (lecture, computer lab and assignments) leads to a significantly higher mean score on the standardized test than learning methods 1 (lecture only) and 4 (lecture and assignment). (All other mean scores are not significantly different at an experiment wise error rate of 5%.)

54

In order to summarize the results of a multiple comparison it is helpful to create a diagram showing the factor levels, ranked by the sample means of the dependent variable, and to underline those levels that are NOT significantly different , as below. This diagram is for the Bonferroni, Tukey, and Scheffe results. 1 4 2 3 4.82 5.36 6.18 7.09 --------------------------- ----------------

Scores

Index1 N

Subset for alpha = 0.05

1 2

Tukey HSDa 1 11 4.82

4 11 5.36 2 11 6.18 6.18

3 11 7.09

Sig. .061 .322

Scheffea 1 11 4.82

4 11 5.36 2 11 6.18 6.18

3 11 7.09

Sig. .098 .404

Means for groups in homogeneous subsets are displayed. a. Uses Harmonic Mean Sample Size = 11.000.

Multiple Comparisons

Dependent Variable:Scores

(I) Index1 (J) Index1 Mean

Difference (I-J) Std. Error Sig.



Tukey HSD 1 2 -1.364 .525 .061 -2.77 .04

3 -2.273* .525 .001 -3.68 -.86

4 -.545 .525 .728 -1.95 .86

2 1 1.364 .525 .061 -.04 2.77

3 -.909 .525 .322 -2.32 .50

4 .818 .525 .414 -.59 2.23

3 1 2.273* .525 .001 .86 3.68

2 .909 .525 .322 -.50 2.32

4 1.727* .525 .011 .32 3.14

4 1 .545 .525 .728 -.86 1.95

2 -.818 .525 .414 -2.23 .59

3 -1.727* .525 .011 -3.14 -.32

55

Scheffe 1 2 -1.364 .525 .098 -2.90 .17

3 -2.273* .525 .001 -3.81 -.74

4 -.545 .525 .783 -2.08 .99

2 1 1.364 .525 .098 -.17 2.90

3 -.909 .525 .404 -2.44 .62

4 .818 .525 .497 -.72 2.35

3 1 2.273* .525 .001 .74 3.81

2 .909 .525 .404 -.62 2.44

4 1.727* .525 .021 .19 3.26

4 1 .545 .525 .783 -.99 2.08

2 -.818 .525 .497 -2.35 .72

3 -1.727* .525 .021 -3.26 -.19

LSD 1 2 -1.364* .525 .013 -2.43 -.30

3 -2.273* .525 .000 -3.33 -1.21

4 -.545 .525 .305 -1.61 .52

2 1 1.364* .525 .013 .30 2.43

3 -.909 .525 .091 -1.97 .15

4 .818 .525 .127 -.24 1.88

3 1 2.273* .525 .000 1.21 3.33

2 .909 .525 .091 -.15 1.97

4 1.727* .525 .002 .67 2.79

4 1 .545 .525 .305 -.52 1.61

2 -.818 .525 .127 -1.88 .24

3 -1.727* .525 .002 -2.79 -.67

Bonferroni 1 2 -1.364 .525 .079 -2.82 .09

3 -2.273* .525 .001 -3.73 -.81

4 -.545 .525 1.000 -2.00 .91

2 1 1.364 .525 .079 -.09 2.82

3 -.909 .525 .548 -2.37 .55

4 .818 .525 .764 -.64 2.28

3 1 2.273* .525 .001 .81 3.73

2 .909 .525 .548 -.55 2.37

4 1.727* .525 .013 .27 3.19

4 1 .545 .525 1.000 -.91 2.00

2 -.818 .525 .764 -2.28 .64

3 -1.727* .525 .013 -3.19 -.27

*. The mean difference is significant at the 0.05 level.

Figure 3.12: SPSS Output for the Multiple Comparison of Means Procedure

3.3 Randomized Block Designs In a single factor ANOVA, once subjects have been chosen, each subject is randomly and independently allocated to levels (treatments) of the factor. Sometimes subjects exhibit heterogeneity with respect to another factor other than the treatment factor of interest. Therefore, we might “block” on that other factor. Then, within each homogenous block, the treatments can be randomly assigned. In general, a randomized complete block design will have b levels of a “blocking” factor and k levels of a treatment factor of interest. Including this block factor allows us to correct for its influence on the dependent treatment variable and to account for the variability it causes.

56

Note that blocks can sometimes be subjects who are tested at each of the k different levels of the treatment factor of interest. When I=2 (the treatment factor of interest has 2 levels), the randomized complete block test can be shown to be analogous to the familiar paired t test. When I > 2, a F test is used. A block factor is only of secondary interest, but as long as blocking is necessary, it should present as significant. Assumptions:

1. The samples are randomly and independently selected. 2. The populations are approximately normal for each factor/block combinations. 3. The population variances for all factor/block combinations are equal.

Model:

Xij = µ + αi + βj + εij , i = 1,…,k, j = 1,…,b

Xij is a random dependent variable denoting the ith treatment measurement in block j k is the number of treatments b is the number of blocks αi is a parameter that indicates the effect for treatment i βj is a parameter that indicates the effect for block j μij = µ + αi + βj is the mean of the ijth treatment/block combination εij is the error in measurement, which can be explained through random effects not included in the model. We can write that we assume εij ~ N(0, σ2).

Example:

Three language tests shall be compared.

1. WAIS vocabulary (linguistic)

2. Willner Unusual Meanings Vocabulary (WUMV) (pragmatic)

3. Willner-Sheerer Analogy Test (WSA) (pragmatic)

It is assumed that subjects should score lower on WUMV and WSA compared to the WAIS. Twelve people took the three tests and their scores are recorded in the table below. Subject WAIS WUMV WSA

1 15 12 11

2 10 11 8

3 6 4 3

4 7 7 5

5 9 6 6

6 16 14 10

7 11 10 7

8 13 9 4

9 12 10 8

10 10 8 7

11 11 9 9

12 14 11 10

57

It is reasonable to assume that the outcome on a test depends on the difficulty of the test and the linguistic ability of the person taking the test. In order to account for the linguistic ability of the different people a Randomized Block Design is chosen, and the test type is the treatment variable and the person is the block variable for the analysis. The data can be found in languagetest.sav on Blackboard. The file includes three variables, subject (ranging from 1 to 12), test (with levels WAIS, WUMV, or WSA), and score (the test results for each subject/test pair. Questions:

1. State the model for the scores on the three language tests. 2. Obtain a line chart showing the means for each block/treatment combination. Does the graph

indicate a treatment effect? block effect? 3. Test at significance level of 5% if the mean scores for the three tests are all the same. 4. Should you do a multiple comparison for the mean scores? Explain. If yes, do one. 5. Use the residuals to check if the model assumptions are met.

Solution:

1. Xij = µ + αi + βj + εij , i = 1,…,3, j = 1,…,12

Xij is a random dependent variable denoting the ith score of person j k is the number of tests b is the number of persons αi is a parameter that indicates the effect for test i βj is a parameter that indicates the effect for person j μij = µ + αi + βj is the mean of the ijth test/person combination εij is the error in measurement, which can be explained through random effects not included in the model. We can write that we assume εij ~ N(0, σ2).

2. To obtain the line chart

(a) Choose Graphs>Legacy Dialogs>Line>Multiple>Define (b) In Lines represent box choose Other Statistics (e.g. mean) for variable score (c) Choose subject for Category Axis (d) Choose test for Define Lines by (see Figure 3.13) (e) Click OK

In the Line Graph (see Figure 3.14) each line represents the results of one test. The lines are more or less parallel to each other, indicating that there is no interaction between subject and test. For each subject, WAIS test score generally falls above the WUMV test score, and that, in turn, generally falls above the WSA test score, so we should expect to find an effect of the test on the mean score. Since the mean scores for the different subjects vary, we should also expect to find an effect of subject on the mean score.

58

Figure 3.13: Define Multiple Line: Summaries for Groups of Cases

Figure 3.14: Line Graph for scores

59

3. For the test (a) H0: α1 = α2 = α3 = 0 versus Ha: at least one is not 0, level of significance = 0.05

(b) To obtain the result from SPSS

a) Go to Analyze>General Linear Model>Univariate... b) Choose score as the Dependent Variable c) Choose test and subject as Fixed Factors d) Click Model, and choose Custom e) For the Built terms choose Main effects from the pull down menu f) Move test and subject into the Model box and click Continue g) Click OK

(c) In the SPSS output (see Figure 3.15) , you can see that F=31.798 with df1 = 11 and df2 = 2

and P-value <0.001.

(d) Since the P-value is less than α we conclude, at a level of significance of 0.05 that the MEAN scores for the tests are not all the same.

Figure 3.15: SPSS Output for RBD Analysis

4. Since we found that at least for one of the tests the mean score is different from the others, we now ask ourselves “where are the differences?”. In order to control for the experiment wise error rate, we should do a multiple comparison of means. We will use Tukey approach here, as it will result in narrower intervals.

Commands to do the multiple comparison are:

(a) Analyze>General Linear Model>Univariate... and choose Post Hoc (b) Move test over into the Post hoc Test for box and click Tukey (c) Click Continue and then OK.

In Figure 3.16, observe that each pairwise comparison is labelled by a star, indicating that at experiment wise error rate of 5% all mean test scores are significantly different from each other. From the confidence intervals we can conclude, that WAIS is the easiest (highest scores), and WAS is the hardest test (lowest scores).

60

Figure 3.16: SPSS Output for Tukey’s Multiple Comparison for the Variable test

5. We can use the residuals and make a normal probability plot of them to see if it is reasonable to assume that the error comes from a normal distribution (see Figure 3.17). To save the residuals and the fitted values,

(a) Go to Analyze>General Linear Model>Univariate (b) Dependent Variables: Score (c ) Fixed Factor(s): Subject and Test (d) Save: Predicted Values: Unstandardized and Residuals: Unstandardized (e) Continue (f) OK

(e) Analyze > Descriptive Statistics > Q-Q Plots (f) Variables: RES_1 (Residuals for Score) (g) Test Distribution: Normal (h) OK

Figure 3.17: Normal Probability Plot of the Residuals

61

Most of the residuals are close to the line with the exception of 2 outlying values. These belong to subject 8, who obtained a 13 in WAIS, 9 in WUMV, and 4 in WSA. This subject, with the second smallest WSA score, has the widest range in values between WSA and WAIS. Although s/he follows the expected pattern of doing best in WAIS and worst in WSA, s/he obtained a higher score on WAIS than we might have expected. Given the robustness of the RCB design to slight departures from normality, the test is still an appropriate one to perform. And again, we note our small sample sizes.

3.4 2-Way ANOVA In randomized block designs we allow for two factors to influence the outcome of the dependent variable, but our interest is only in the effect of the treatment variable, and we only include the block variable to account for its influence on the dependent variable. We also assume that the treatment effect is the same for each block. In 2-way ANOVA we include two factor variables (say A and B) in the model, and their combination determines the treatment, e.g. the choice of seed + choice of fertilizer determines the treatment of a certain plot. We no longer assume that the effect of one factor is the same for each level of the other factor. Interaction is possible. Assumptions:

1. The samples are randomly and independently selected 2. The populations are approximately normal for each treatment (i.e. factor combination) 3. The population variances for all treatments are equal.

Model:

Xijl = µ + αi + βj + (αβ)ij + εijl , i = 1,…,I, j = 1, …, J, l = 1, …, L

Xijl is a random dependent variable denoting the lth treatment measurement at level i of Factor A and level j of Factor B I is the number of levels of Factor A J is the number of levels of Factor B L is the number of observations at treatment combination ij αi is the effect for level i in Factor A βj is the effect for level j in Factor B (αβ)ij is the interaction effect of level i of factor A and level j of factor B εijl is the error in measurement, which can be explained through random effects not included in the model. We can write that we assume εijl ~ N(0, σ2).

Note that we will only consider the balanced two-way ANOVA design (with the same number of sample units for each (level i of factor A, level j of factor B) combination.

62

Example: The distance a golf ball is hit depends on the type of club used, and on the brand of the ball used. A golf player investigates this effect by hitting balls of 4 different brands (A, B, C, D) with a driver and a five iron. For each hit the distance the ball travelled is measured. The results can be found in golf.sav on Blackboard.

1. State the model for distance in dependency on the club used and the brand of the ball. Include interaction effects.

2. Obtain two line charts showing the means for each club/brand combination. One should have club on the horizontal axis and ball on the lines, and the other should have ball on the horizontal axis and club on the lines. Do these graphs indicate an interaction effect? a club effect? a brand effect?

3. Obtain clustered boxplots to check the assumptions 4. Create a two-way ANOVA table with which you will perform inference for your data. 5. Analyze the residuals from this model for normality. 6. Test if the model is useful in describing the distance a ball was hit with a driver of five iron. 7. Test if the two factors club and brand of the ball interact. 8. Test if the brand of the balls effects the mean distance the balls were hit. 9. Test if the mean distances the balls were hit are significantly different for the driver and the five

iron. 10. Is it useful to do multiple comparisons for comparing the mean distances for the different brands

of balls and/or different clubs? Do any post hoc analysis necessary. Solution: 1. Xijl = µ + αi + βj + (αβ)ij + εijl , i = 1,…,4, j = 1, …, 2, l = 1, …, 4

Xijl is a random dependent variable denoting the distance ball of brand i was hit by club of type j for repetition l αi is the effect for ball brand i βj is the effect for club type j (αβ)ij is the interaction effect of ball brand i and club type j εijl is the error in measurement, which can be explained through random effects not included in the model. We can write that we assume εijl ~ N(0, σ2).

2. To obtain the line charts

a. Choose Graphs>Legacy Dialogs>Line>Multiple>Define b. In Lines represent box choose Other statistics (e.g. mean) for variable distance c. Choose ball as the Category Axis d. Choose club for Define Lines by e. Click OK

See Figure 3.18 for the results.

63

Figure 3.18: Line Chart for mean distance

The difference between the average of the means for the driver and the average of the means for the five iron indicates a likely effect of the factor club. The difference between the average of the means for the four brands indicates a likely effect of the brand of the balls. Since the lines are not parallel we should expect to find an interaction effect within the model.

3. To find a clustered boxplot with SPSS a. Select Graphs>Boxplot...>Clustered>Define b. Set Variable to distance, Category Axis to club, and Define Clusters by: to ball c. Click OK

64

Figure 3.19: Clustered Boxplot for Distance Dependent on Choice of Club and Brand of Ball

All boxes look somewhat symmetric (see Figure 3.19), but the lengths of the boxes do vary. We find no notable evidence against the assumption that the data is normally distributed, but we should look further at the estimated standard deviations to check if the assumption of equal variances is reasonable. One way to find the descriptive statistics split by the factors follows.

(a) Select Data>Split File (b) Select Compare groups (c) Move ball and club into the Groups based on: box (d) Click OK (e) Select Analyze>Descriptive Statistics>Descriptives (f) Move distance into the Variable(s) box (g) Click OK

In the output (see Figure 3.20) the largest standard deviation is 9.24, which is more than 4 times larger than the smallest (1.96). Therefore we have indication that the variances might not be the same.

Figure 3.20: Descriptive Statistics for the Distance of Each Treatment (club/ball combination)

65

4. To obtain all the results we need to conduct a 2-way ANOVA (see Figure 3.21).

(You will need to make make sure that you follow the path Data>Split File and change back to “Analyze all cases, do not create groups” prior to following the commands below)

a. Select Analyze>General Linear Model>Univariate b. Select distance as the dependent variable c. Select club and ball as Fixed Factors d. Click OK

Figure 3.21: ANOVA Table for Distance Depending on Club and Ball

5. To obtain residuals to use, the commands above are used with an additional command to save

the residuals, viz.

a. Select Analyze>General Linear Model>Univariate b. Select distance as the dependent variable c. Select club and ball as Fixed Factors d. Save Button: Under Residuals: Check Standardized e. Continue f. Click OK

In the data file, a column labeled ZRE_1 will have appeared. We will make a normal probability plot of these residuals (see Figure 3.22).

(a) Analyze > Descriptive Statistics > QQplots (b) Select Standardized Residuals as the Variable (c) Under Test Distibution, ensure Normal appears (d) Click OK

66

Figure 3.22: Normal Probability Plot of the Residuals Most of the points are fairly close to the straight line. It is reasonable to assume that the residuals come from a normal distribution (that is, that the distance variable is normally distributed for each (club type, ball brand) combination.

6. Testing for model (see Figure 3.21 for output):

(a) Hypotheses: H0: all treatment (driver/ball combinations) means are equal versus Ha: at least one mean is different, α = 0.05

(b) The assumption of normality seems to be met, but the variances might not be the same (c) Fo= 140.689 with df1 = 7 and df2 = 24 (d) P value < 0.001 < α = 0.05 (e) Reject H0, the data provide sufficient evidence that the treatment means are not all the

same for all driver/ball combinations. The 2-way model helps to explain the distance a ball was hit.

7. Testing for interaction of club and ball (see Figure 3.21 for output):

a. Hypotheses: H0: (αβ)11 = ... = (αβ)42 = 0 versus Ha: at least one interaction term is not 0, α=0.05

b. See above for assumptions c. Fo = 7.459, with df1 = 3 and df2 = 24 d. P value = 0.001 < α = 0.05 e. Reject H0, the data provide sufficient evidence that not all interaction terms are zero, i.e.

interaction exists and that certain club/ball combinations seem to be particularly good/bad, and not explained through main effects by club and ball. It appears that certain combinations of club/ball result in significant different mean distances that cannot be explained by just adding the main effect means for that club and ball.

8. Testing for main effect of ball (see Figure 3.13 for output):

a. Hypotheses: H0: α1 = ... = α4 = 0 versus Ha: at least one term is not 0, α=0.05 b. See above for assumptions c. Fo = 7.817, with df1 = 3 and df2 = 24 d. P value = 0.001 < α = 0.05 e. Reject H0, the data provide sufficient that the mean distances are not all the same for the

different brands, averaging over all clubs.

67

9. Testing for main effect of club (see Figure 3.21 for output):

a. Hypotheses: H0: β1 = β2 = 0 versus Ha: at least one term is not 0, α=0.05 b. See above for assumptions c. Fo = 938.996 with df1 = 1 and df2 = 24 d. P value < 0.001 < α = 0.05 e. Reject H0, the data provide sufficient that the mean distances are not all the same for the

different clubs, averaging over all brands.

The 2-way ANOVA analysis confirmed that the distance the balls went depended on the choice of club and the brand of the balls, and in addition an interaction effect was confirmed. Certain combination of club/brand work particularly well/bad beyond what can be explained through the main effects. All these results have to be treated carefully because the assumption that the variances are equal seems to be violated. Larger sample sizes could help to make a decision on this problem.

10. With the 2-way ANOVA analysis we confirmed the presence of an interaction effect of club and

balls on the distance a golf ball was hit. This means that for the different clubs, different balls were best/worse (the mean distance hit fell above/below the expected value). Or, for the different balls, different clubs were best/worse (the mean distance hit fell above/below the expected value). To further analyze which type of ball was best/worst for each club, we should, for each club type, conduct a pairwise multiple comparison of the mean distances attained by all possible pairs of ball brands when that club type was used. (that is, for each pair of ball brands, test if the mean distance the balls were hit are significantly different when that club type was used.) To further analyze which type of club was best/worst, we should, for each ball brand, conduct a pairwise multiple comparison of the mean distances attained by five iron and driver when that ball brand was used (that is, for the five iron driver combination, test if the mean distance the balls were hit by five irons is significantly different from the mean distance the balls were hit by drivers when that ball brand was used) We will need to rerun the 2-way ANOVA commands, but this time multiple comparison commands will be included. It is important to note that these multiple comparisons are looking 1) at pairwise intervals for each level of Club (our first factor of interest) and 2) at pairwise intervals for each level of Ball (our second factor of interest). This is because our model has interaction. Commands for multiple comparisons of ball brands within levels of club type and for multiple comparisons of club types within levels of ball brand follow. We will make Bonferroni intervals here. SPSS will not create Tukey intervals in this case, when we wish to look for pairwise comparisons of means of one factor within the levels of another factors. (Note that the commands ADJ(LSD) or ADJ(Scheffe) can be used, though.)

68

Select Analyze>General Linear Model>Univariate.

a. Click Options b. Select club*ball c. Click Continue d. Click Paste to open the SPSS Syntax Editor e. Complete the line starting with /EMMEANS = TABLES(club*ball) by adding

COMPARE(club) ADJ(Bonferroni) f. Add another line underneath /EMMEANS = TABLES(club*ball) by adding COMPARE(ball)

ADJ(Bonferroni) (see Figure 3.23) g. Select Run>All

Figure 3.23: Syntax for Analysis of Treatment Means

SPSS conducts multiple comparisons (Bonferroni procedures) of the mean distances by brand for each club separately, and by club for each brand separately – see Figures 3.24 and 3.25 below. Figure 3.24 shows the results of comparisons of the mean distances attained by driver and five iron for each brand of balls separately. At any reasonable experiment wise error rate the mean distances for driver and five iron are significantly different for all brands of balls. The mean distance made using the driver is always greater than the mean distance using the five iron.

69

Figure 3.25 shows the results of comparisons of the mean distances attained by all the ball brand pairs (there are 6 distinct pairs) for each club type separately. For drivers, brand C significantly outperforms brand A and brand D at an experiment wise error rate of 5%. For five irons, brand B significantly outperforms brand C and brand D at an experiment wise error rate of 5%. No other significant results are observed at an experiment wise error rate of 5%. The star beside any mean difference indicates that the difference is significantly different from 0. One could also check the confidence intervals, and find that none includes zero.

Estimates

Dependent Variable:distance

club ball Mean Std. Error



Driver A 228.425 2.923 222.393 234.457

B 233.725 2.923 227.693 239.757

C 243.100 2.923 237.068 249.132

D 229.750 2.923 223.718 235.782

Five iron A 171.300 2.923 165.268 177.332

B 182.675 2.923 176.643 188.707

C 167.200 2.923 161.168 173.232

D 160.500 2.923 154.468 166.532

Figure 3.24: Multiple Comparison of Mean Distances by Club for Each Brand

The output in Figure 3.24 gives the outcomes of four multiple comparisons (one for each Brand of ball) of the mean distances for the driver and five iron. All results are significant. The mean distance made using the driver is always greater than the mean distance using the five iron.

70

Figure 3.25: Multiple Comparison of Mean Distances by Brand for Each Club

The output in Figure 3.25 gives the outcomes of two multiple comparisons (one for the driver and a second for the five iron) of the mean distances for the different brands of balls.

Diagram for Driver A D B C 228.4 229.8 233.7 243.1 --------------------------------------------------

---------------------------- The diagram shows that at experiment wise error rate of 5% balls from brand C went significantly farther when hit with the driver than balls from brand A and D. No other differences are significant at this error rate.

Diagram for Five Iron D C A B 160.5 167.2 171.3 182.5 -------------------------------------------------- ---------------------------------------------

The diagram shows that at experiment wise error rate of 5% balls from brand B went significantly farther when hit with the five iron than balls from brand D. No other differences are significant at this error rate.

71

Note: If you wished Bonferroni and Tukey (or any of the other pairwise comparison methods offered) to compare marginal pairwise means for ball and/or for club (if significance indicated their investigation warranted), you could have included those requests in the Post Hoc commands when running the Analyze>GLM>Univariate commands. However, they are not of interest here because there is interaction in the model. They would have only been of interest if no interaction had been found, and only when they were found significant. IMPORTANT: In general, if a 2-WAY ANOVA model does not have interaction, and a factor turns out to be significant, then the pairwise multiple comparison intervals should not be created separately for particular levels of the other factor. In this case, it is correct to create multiple comparison intervals for the differences for the marginal pairwise means themselves. This should be done from the Post Hoc box when following the Analyze > General Linear Model > Univariate path. This was what we did in the randomized complete block question above where the model assumes no interaction between language tests (first factor) and blocking (second factor). There, when the factor language test turned out to be significant, we looked for the marginal pairwise comparison intervals between mean scores on the language tests (we created intervals to compare μWAIS to μ MWUMV, μMWAIS to μWSA, and μWAIS to μWSA.)

72

Chapter 4 Non-Parametric Statistics

4.1 Wilcoxon (Mann-Whitney) Rank Sum Test for 2 Independent Samples Assumptions:

1. Independent simple random samples 2. Numerical Response Variable 3. Same shaped populations with equal variances. 4. Continuous population distributions (so not many ties) 5. At least 10 observations in each sample Note the absence of a normality assumption.

HO: LOCATION OF D1 AND D2 IS THE SAME HA: LOCATIONS OF D1 AND D2 DIFFER (OR D1 IS SHIFTED RIGHT OF D2 OR D1 IS SHIFTED RIGHT OF D1) (WHERE D1 AND D2 ARE IDENTICALLY SHAPED DISTRIBUTIONS FROM WHICH THE TWO INDEPENDENT SIMPLE RANDOM SAMPLES WERE CHOSEN)

Example: Do patients taking Drug A take less time to recover? A new medicine, Drug A, has been developed for treating patients with low hemoglobin counts. The pharmaceutical company that developed the new medicine is planning to advertise that it is superior to another medicine, Drug B, currently in use. As evidence the company uses the number of days to recovery of a sample of patients who were independently and randomly assigned to one or the other of the two drugs.

The data:

Drug A: 14, 10, 1, 12, 11, 14, 8, 10, 2, 12, 16, 12, 12, 15, 4 Drug B: 17, 15, 5, 14, 18, 3, 16, 13, 15, 16, 17, 8, 19

The data are given in the SPSS data file drug.sav . Do patients taking Drug A take less time to recover? Conduct either a t-test or the two-sample Wilcoxon rank sum test. Use a 5% significance level in your test.

Solution:

In order to decide which test to use, first we check the assumptions. The side-by-side boxplots and histograms showing the time in days to recovery by the two drugs are shown in Figures 4.1 and 4.2.

To draw the boxplots,

1. Select Graphs>Legacy Dialogs>Boxplot 2. Choose Simple and Summaries for groups of cases 3. Define 4. Variable: Time 5. Category Axis: Drug 6. Click OK

73

Figure 4.1: Side by Side Boxplots Showing Time (in days) to Recovery

To draw the histograms,

7. Select Graphs>Legacy Dialogs>Histogram 8. Check Display normal curve 9. Choose time for variable 10. Use drug for rows 11. Click OK

Figure 4.2: Histograms for Drug A and B, with Normal Curve

You can also get the box-plots and separate histograms for the two drugs by using Analyze>Descriptive Statistics>Explore

74

Based on the side-by-side boxplots and histograms, it appears reasonable to assume that the underlying distributions are similar in shape. However, both distributions are skewed to the left and do not appear to be normal.

Both indicate that the Wilcoxon Rank Sum Test is the appropriate tool to use.

SPSS only accepts numerical grouping variables for the 2-sample test, so we have to recode the variable Drug into a numerical variable, say n_drug, prior to performing the WSR test. Commands to transform the data are:

1. Transform>Recode>Into Different Variables... 2. Double click the variable Drug and write the name and the label of the new variable n_drug 3. Click Change 4. Select Old and New Values 5. Enter the old (A,B) and the new values (1,2), and every time click the Add button (see Figure 4.3) 6. Select Continue to close the Old and New Values dialog box 7. Click OK to close the Recode into Different Variables dialog box

Figure 4.3: Recoding into a Numerical Variable

Now we can perform the Wilcoxon Rank Sum (Mann-Whitney) test:

Use Di, i = 1,2, for the distribution shapes of the recovery days for the 2 populations (1=Drug A and 2=Drug B). Let M1 = the median number of days to recover by drug A, and M2 = the median number of days to recover by drug B. Let µ1 = the mean number of days to recover by drug A, and µ2 = the mean number of days to recover by drug B. Step 1: Hypotheses: Ho: D1 – D2 >= 0 versus Ha: D1 – D2 < 0

(or H0: M1 – M2 ≥ 0 versus Ha: M1 – M2 < 0) (or Ho: µ1 – µ2 ≥ 0 versus Ha: µ1 – µ2 < 0)

Step 2: Assumptions: Independent simple random samples (as per sample design), numerical response variable, same shape continuous populations with equal variances, at least 10 observations in each sample.

75

Prior to obtaining the SPSS output to help finish the formal write –up for the hypothesis test, recall how the WRS test works. First, all data from both samples is ranked (when there are ties, each tied observation is assigned the mean of the ranks they would have had if no ties were present). Then the average ranks for the two samples are compared to see if they differ significantly from what would be expected if the null hypothesis were true. An average rank from one population that was significantly smaller than the average rank of the other population would imply that the median of the one population would be significantly smaller than the median of the second population, and vice-versa. Let W= the sum of the ranks of sample 1. Since the sum of the ranks is n(n+1)/2, the sum of the ranks of sample 2 can readily be determined if W is known. Therefore, it suffices to choose W as the test statistic. When both sample sizes are at least 10, the W distribution, if Ho is true, is close to normal with

mean

and standard deviation √

.

SPSS will calculate both the asymptotic p-value and an exact p-value. This exact p-value for a WRS (M-W) test can be calculated for any sample sizes, even those less than 10. However, if one chooses to report an exact p-value for sample sizes less than 10, and use it to perform a test, one should be very very sure that the other assumptions are met. Due to the difficulty of making inferences from small samples to larger populations, many textbooks (such as Weiss) choose to teach students that they should only perform the WSR test when both samples sizes are at least 10, and have them use the asymptotic p-value. We take a moment to look at the by hand results for this example. This is often a good idea as we can check that we are running the SPSS commands correctly with a small amount of data, in order to ensure we get it right for samples with more data.

SAMPLE 1 OVERALL RANK SAMPLE 2 OVERALL RANK

1 1 2 3

1 2 2 5

1 4 2 6.5

1 6.5 2 15

1 8.5 2 17

1 8.5 2 20

1 10 2 20

1 12.5 2 23

1 12.5 2 23

1 12.5 2 25.5

1 12.5 2 25.5

1 17 2 27

1 17 2 28

1 20 238.5 SUM SAMPLE 2

1 23 18.34615 AVERAGE SAMPLE 2

167.5 WILCOXIN W

11.16667 AVERAGE SAMPLE 1

76

SPSS commands for the Mann-Whitney test: (a) Select Analyze>Nonparametric Tests>Legacy Dialogs>2 Independent samples (b) Select Time on the Test Variable list and n_drug as the Grouping Variables (see Figure

4.4) (c) Click Define Groups, enter 1 for Group 1 and 2 for Group 2 then click Continue (d) Click OK

Figure 4.4: Wilcoxon (Mann-Whitney) Test

Step 3: U* = 47.5 (or W* = 167.5) (see Figure 4.5) Step 4: P-value = 0.0095 (1/2 of the reported 2 sided P-value of 0.190) (see Figure 4.5) Step 5: Since our P-value < 0.05, do reject Ho. Step 6: We have significant evidence, at the 5% significance level, that patients using the new drug A recover faster than patients taking drug B.

Figure 4.5: SPSS Output for Mann-Whitney Non-Parametric Test

77

4.2 Inferences About Two Population Medians Using Wilcoxon’s Signed Rank Tests for Paired Differences The Wilcoxon Signed Rank Test for matched pairs experiments is used when the underlying population of differences does not have a normal distribution.

Assumptions: 1. simple random paired samples 2. symmetric differences 3. continuous population distribution of differences (not many ties) Example: We select a simple random sample of 20 boyfriend/girlfriend couples in the second year of their relationship and record the minutes they spend texting each other in a typical week. The data for these 20 couples can be found below and in the file boyfriendgirlfriendtextpairs.sav. Boy Girl Difference 45 40 -5 50 45 -5 56 52 -4 62 58 -4 66 63 -3 77 75 -2 77 76 -1 78 78 0 79 80 1 85 87 2 87 90 3 89 93 4 90 95 5 96 102 6 98 105 7 102 110 8 111 120 9 115 124 9 123 133 10 133 143 10 1. Examine boxplots and histograms for the boyfriend and girlfriend distributions. We can see that they are similar in shape, and that there are no outliers. When the distributions from which we calculate the differences are similar in shape, it follows the distribution of the differences will be symmetric.

78

Figure 4.6 We defined a new variable diff = boyfriend - girlfriend, and then made a histogram. As can be seen, the distribution of the differences is quite symmetric.

Figure 4.7: Distribution of Difference in Minutes

2. Test, at a 5% significance level, if there is a difference in the median minutes of textings between boyfriend and girlfriend who are in the second year of their relationship We use D1 for the distribution of the boyfriend texting minute population and D2 for the distribution of the girlfriend texting population. We use Difference = Boyfriend Texting Minutes – Girlfriend Texting Minutes. We pre-chose α to be 0.05.

79

Step 1: H0 : D1 = D2 vs. Ha : D1 ≠ D2 (or Mdifferences = 0 versus Mdifferences ≠ 0) (or µdifferences = 0 versus µdifferences ≠ 0)

Step 2: Assumptions: We have a simple random paired sample. The distribution of texting minutes differences is continuous (although we did end up with 4 ties because we presented minutes in whole units). The distribution of differences, as shown above, looks to be symmetric. Prior to obtaining the SPSS output to help finish the formal write –up for the hypothesis test, recall how the WSR test works. We calculate the absolute differences, |di| for all pairs. If any of the di equal 0, they are removed from the experiment, and the number of pairs is reduced by 1. The |di| are then ranked, and finally, the ranks are given signs that correspond to the signs of the original di. The di give us an indication of how far away the differences (of the pairs) are from 0. If two or more of the absolute paired differences are tied, each is assigned the rank that it would have been if there were not ties. If the null hypothesis is true, then we would expect the sum of the positive ranks and the sum of the

negative ranks to be similar in magnitude. That is, we would expect both sums to be

. We compare

them to see if they differ significantly from what would be expected if the null hypothesis were true. The test statistic W= the sum of the positive ranks is chosen. When both sample sizes are above about

20, the W distribution is close to normal with mean

and standard deviation √

.

SPSS will calculate the p-value ONLY for this situation (that is, it will only calculate the asymptotic P-value). We take a moment to look at the by hand results for this example. This is often a good idea as we can check that we are running the SPSS commands correctly with a small amount of data, in order to ensure we get it right for samples with more data.

In our case, we have one difference of 0, so we remove it from the data prior to doing the test. This means that we will have 19 paired differences. 19 is close enough to 20 that we decide to proceed with the test.

Boyfriend Girlfriend Diff Abs Diff Rank Signed

Rank 45 40 -5 5 11 -11 50 45 -5 5 11 -11 56 52 -4 4 8 -8 62 58 -4 4 8 -8 66 63 -3 3 5.5 -5.5 77 75 -2 2 3.5 -3.5 77 76 -1 1 1.5 -1.5 79 80 1 1 1.5 1.5 85 87 2 2 3.5 3.5 87 90 3 3 5.5 5.5 89 93 4 4 8 8 90 95 5 5 11 11 96 102 6 6 13 13 98 105 7 7 14 14

102 110 8 8 15 15 111 120 9 9 16.5 16.5 115 124 9 9 16.5 16.5 123 133 10 10 18.5 18.5 133 143 10 10 18.5 18.5

80

The absolute total of the positive ranks above is 141.5 and the absolute total of the negative ranks above is 48.5. This matches the information in the output below in Figure 4.8. Here we are uses the difference “Girlfriend – Boyfriend” minutes. You must put Boyfriend into SPSS first and girlfriend into SPSS second if you wish the test to present positive and negative ranks to match this difference.

SPSS commands for Wilcoxon signed rank tests:

i. Select Analyze>Nonparametric Tests> Legacy Dialogs > 2 Related Samples... ii. Select Boyfriend - Girlfriend as Test Pair(s) iii. Check Wilcoxon in the Test Type box iv. Click OK.

Step 3: Z* = -1.874 Step 4: P-value = 0.061 (2 sided, as provided by SPSS) Step 5: Since P-value > α (0.061 > 0.05), we do not reject Ho Step 6: The data does not provide evidence that the median of the differences of “girlfriend – boyfriend” texting minute pairs differs from 0, at a 5% significance level. We do note, however, that our p-value of 0.061 is relatively close to our pre-chosen level of significance of 0.05.

Ranks

N Mean Rank Sum of Ranks

Girlfriend - Boyfriend Negative Ranks 7a 6.93 48.50

Positive Ranks 12b 11.79 141.50

Ties 0c

Total 19 a. Girlfriend < Boyfriend b. Girlfriend > Boyfriend c. Girlfriend = Boyfriend

Test Statistics

b

Girlfriend - Boyfriend

Z -1.874a

Asymp. Sig. (2-tailed) .061

a. Based on negative ranks. b. Wilcoxon Signed Ranks Test

Figure 4.8: SPSS Output for the Wilcoxon Signed Rank Test

81

4.3 The Kruskal-Wallis Test for k independent samples

The Kruskal-Wallis test is a generalization of the Wilcoxon Rank Sum test for comparing the locations of two distributions based on independent samples to the case of comparing the locations of k distributions based on k independent samples. It can be used to test equality of medians when the parent distributions are similar in shape. The hypotheses can thus be worded in terms of the population medians, if one wishes. Assumptions:

1.Independent simple random samples 2.Numerical Response Variable 3.Same shaped populations with equal variances. 4.Continuous population distributions (so not many ties) 5.At least 5 observations in each sample

HO: LOCATION OF ALL Di DISTRIBUTIONS IS THE SAME, i = 1,2,…k HA: LOCATION OF AT LEAST ONE OF THE Di DISTRIBUTIONS DIFFERS (WHERE D1 ,D2 , … Dk ARE IDENTICALLY SHAPED DISTRIBUTIONS FROM WHICH THE k INDEPENDENT SIMPLE RANDOM SAMPLES WERE CHOSEN )

Example: The carbon monoxide level was measured (in parts per million) at three randomly selected industrial sites. Data can be found below and in the file carbonmonoxide.sav. Is there a significant difference in carbon monoxide levels at the three sites? Test at a significance level of 10%.

Site A: 0.106, 0.127, 0.132, 0.105, 0.117, 0.109, 0.107, 0.109

Site B: 0.121, 0.119, 0.121, 0.120, 0.117, 0.134, 0.118, 0.142

Site C: 0.119, 0.110, 0.106, 0.118, 0.115, 0.121, 0.109, 0.134

1. How do you classify the shapes of the distributions? Are the three distributions similar in shape? 2. Should the ordinary F-test or the Kruskal test be used to compare the centers of the three distributions? 3. Conduct the appropriate test at a significance level of 10%. 4. If appropriate, conduct a multiple comparison for the centers of the distributions of the carbon monoxide levels at the three industrial sites. Solution: 1. The boxplots in Figure 4.9 show that all three distributions are skewed right (the top 25% have the widest range). Thus, normality is not a reasonable assumption. However, the distributions are somewhat similar in shape and look to have variances that are not dissimilar. This can be borne out by examination of the histograms. A summary of descriptive statistics for these 3 sites is provided in the table below (output is not included). The ratio of the largest to the smallest standard deviation is .010323/.008832 = 1.168818 < 2.

82

Mean Median Standard Deviation

Site A .11400 .10900 .010323

Site B .12400 .12050 .009008

Site C .11800 .11650 .008832

Figure 4.9: Boxplots and Histograms to investigate Distributions of Carbon Monoxide Level at Three Sites

83

2. Because we do not have normal distributions of CO levels at the sites, the Kruskal-Wallis test should be used for comparing the center (median) of these distributions.

3. We perform the Kruskal Wallis Test. Step 1: H0: D1 = D2 = D3 = D4 = D5 versus Ha: At least one distribution is shifted to the right or left of the other distributions. Use α = 0.10

Step 2: Assumptions: Independent random samples (as per sample design), numerical response variable of carbon monoxide levels, same shape continuous populations (checked) with equal variances (checked), and all sample sizes are greater than 5 (check). To perform a Kruskal Wallis test, we rank the data from all samples combined, and calculate the average rank, Ri, for each sample. If R = n(n+1)/2 represents the total of the ranks, then the overall mean of the n ranks is then (n+1)/2. The Kruskal Wallis test statistic H is based on a weighted average of the squared differences between the Ri and (n+1)/2. H measures the variation among the mean ranks and it follows a chisqure distribution with k-1 degrees of freedom when Ho is true.

H =

∑

If the Di are all equal, then the mean ranks are all close to R, and H will be small, but if any one (or more) of the Di is/are shifted away from the others, that/those Di will have a larger average rank than R, and H will tend to larger.

Site A Overall

Rank Site B Overall

Rank Site C Overall

Rank

0.105 1 0.117 10.5 0.106 2.5

0.106 2.5 0.118 12.5 0.109 6

0.107 4 0.119 14.5 0.11 8

0.109 6 0.12 16 0.115 9

0.109 6 0.121 18 0.118 12.5

0.117 10.5 0.121 18 0.119 14.5

0.127 20 0.134 22.5 0.121 18

0.132 21 0.142 24 0.134 22.5

71

136

93

8.875

17

11.625

H =

∑

=

(

) (

) (

) =

0.02( 8 (-3.125)2 + 8(5)2 + 8(-.375)2) = 0.02(8)(9.765625 + 25 + 0.140625) = (0.02)(8)(34.90625) = 5.585

(Note that due to ties and SPSS computing the test statistic in a slightly different manner, the H found here does not match the H in the SPSS output. However, the sums and averages of the ranks for the three treatments match, so we know that have set the problem up correctly when we submitted it to SPSS.)

84

SPSS commands: 1. Select Analyze>Nonparametric Tests>Legacy Dialogs>K Independent samples... 2. Select CarbonMon as the test variable list and site as the Grouping Variable (see Figure 4.10) 3. Click on Define range and type 1 for Minimum and 3 for Maximum then click on Continue 4. Check Kruskal-Wallis H button in the Test type dialog box 5. Click OK

Step 3: H*= 5.496 with 2 df. Step 4: P-value = 0.064 Step 5: Since our P-value < 0.10, reject Ho Step 6: We have significant evidence, at the 10% significance level, that at least one of sites has a carbon monoxide level distributions that differs in location from the others.

Figure 4.10: Dialog Box for the Kruskal-Wallis Test

85

Figure 4.11: SPSS Output for the Kruskal-Wallis Test

4. Since the Kruskal-Wallis test indicates that not all three medians are equal, we want to know where the differences are. In order to control for the experiment wise error rate a Bonferroni procedure should be used.

Choose an experiment wise error rate of 10% The number or comparisons we have to do is c=k(k-1)/2 = 3 Then the comparison wise error rate has to be α* = α/c = 0.0333 Now we have to compare the medians for each industrial site with each other industrial site. We should use Wilcoxon Rank Sum test for this: This following table summarizes the result of using SPSS for testing H0: Mi = Mj versus Ha: Mi ≠ Mj for each i,j pair with and α* = 0.0333

Sites test statistic P-value Decision A - B 48.500 0.038 do not reject H0 A - C 58.500 0.328 do not reject H0 B - C 51.500 0.083 do not reject H0

Even though the Kruskal-Wallis test was significant, at an error rate of 10% we do not find any pairwise significant difference between the median carbon monoxide levels for the three industrial sites using Bonferroni's procedure. It is worth noting that the A-B P-value of 0.038 is very close to the α* of 0.033, and we have a result that borders on significant when we look at the difference between sites A and B in this situation.

86

Chapter 5 Simple Linear Regression Objectives: After studying this chapter you should be able to

1. Create an X-Y scatterplot for two quantitative variables 2. Perform a simple linear regression analysis 3. Test hypotheses concerning the linear relationship between two quantitative variables 4. Evaluate the goodness of fit of a linear regression model 5. Check the model assumptions.

5.1 Linear Regression Model Many statistical studies are designed to explore the relationship between quantitative variables, such as the relationship between height and weight of people, the concentration of an injected drug and heart rate, or the consumption level of some nutrient and weight gain. The nature and strength of the relationship between two variables of interest, such as these, may be examined by two important statistical techniques called regression and correlation analysis. Consider the simple case where there is just one exploratory (independent) variable X and the response (dependent) variable Y. The response variable depends on the explanatory variables. We assume the mean response can be expressed as a linear combination of the explanatory variable: µy = β0 + β1x This expression is the population regression equation. βo is the intercept of the line and β1 is the slope of the line. We cannot directly observe this equation because the observed values of y vary about their means. We can think of subpopulations of responses, each corresponding to a particular observed explanatory variable x. In each subpopulation, y varies normally with mean given by the population regression equation. The regression model assumes that the standard deviation σ of the responses is the same in all subpopulations. The simple linear regression (SLR) model can be written y = β0 + β1x + ε ε ~ N(0, σ2) The model parameters are: β0, β1 and σ. The random errors corresponding to the subpopulations of responses are assumed uncorrelated. The non-random (deterministic) part in the SLR model, the line relating x and y, is the population regression line mentioned above.

87

It is of interest to fit a line of “best” fit to n observed data points (xi, yi), I = 1, …, n. We chose to fit a “least squares line” that minimizes the sum of the squared vertical deviations of the

points (x1, y1), … , (xn, yn) from the line. Some calculus will provide us with and , the least squares estimates of β0 and β1 . The simple regression line (based on the observed units) can be written as:

where the is the predicted value for a given value x of the explanatory variable. Example: The SPSS data file sbp.sav contains data on systolic blood pressure (SBP) and age for a sample of 30 individuals.

1. Construct a scatter diagram and describe the relationship between SBP and age. 2. Obtain the estimated regression line of sbp on age. 3. Obtain an estimate of σ. 4. Find the correlation coefficient and the coefficient of determination for sbp and age. 5. Conduct a test, at 1% level of significance, to decide whether or not there is a positive linear

association between SBP and age. 6. Obtain 95% confidence intervals for the slope parameter β1 and the intercept β0 . 7. Obtain a 95% confidence interval for the estimate of the mean SBP of all individuals aged 65

years. 8. Predict with 95% confidence the SBP of an individual whose age is 65. 9. Check the model assumptions by using the appropriate graphical methods and tests. Solution:

1. To create a scatter diagram of SBP and age using SPSS,

(a) Choose Graphs>Legacy Dialogs>Scatter/Dot (b) Click the Simple Scatter icon and select Define (c) Specify sbp in the Y axis text box and age in the X axis box (d) Click Titles, type SBP versus Age and click Continue (e) Click OK

The scatter diagram appears in the SPSS viewer window (see Figure 5.1). There appears to be a moderately strong, positive, linear association between age and SBP. One outlier seems to be included in the data set.

88

Figure 5.1: Scatter Diagram of Systolic Blood Pressure versus Age

2. To estimate the regression line of sbp on age using SPSS...

(a) Choose Analyze>Regression>Linear (b) Select sbp for the dependent box (c) Select age for the independent(s) box (d) Click Statistics, check Estimates from Regression Coefficients box and Model fit, and

Click Continue (e) Click Options, check Include constant in equation then click Continue (f) Click OK

Figure 5.2: Regression Dialog Box

89

From the SPSS output in Figure 5.3, we can see that the estimated regression equation is

sbp = 98.715 + 0.971age

Figure 5.3: Estimated Regression Coefficients

3. In Figure 5.4 we see the output for the model summary and ANOVA. The estimate for σ is given in

the last column of the model summary table. An alternative way to calculate σ uses information from the ANOVA table

Estimate σ = √ = √ = 17.314

Figure 5.4: Model Fit

4. From the model summary table, the correlation coefficient r = 0.658 and the coefficient of

determination R2 = 0.432 = 43.2%. Thus, we have a moderate to weak linear relation, and 43.2% of the variation in the variable sbp is explained by the linear regression on age.

5. The hypotheses are

(a) H0: β1 ≤ 0 (b) Ha: β1 > 0

From the SPSS output in Figure 5.3, the test statistic t = 4.618, df = dfe = 28 and the P-value

<0.001/2. Since the P-value is very small, the null hypothesis is rejected and we conclude that there is a positive linear relationship between sbp and age.

90

6. To obtain confidence intervals for the slope and the intercept parameters choose Analyze>Regression>Linear, select Statistics and check Confidence intervals. From the table in Figure 5.5 we can see that the 95% confidence interval for β1 is (0.54, 1.401) and the 95% confidence interval for β0 is (78.230, 119.200).

7. To calculate a confidence interval for the mean of y for a given value of x with SPSS choose

Analyze>Regression >Linear, select Save and check Mean in the Prediction intervals box. If you also check Individual you get the individual prediction intervals, too. (In the predicted values box, check Unstandardized. This will return the point estimates for your confidence interval and prediction interval. )Type 95 in the Confidence Interval text box (see Figure 5.6). The lower bounds of the confidence intervals for the average sbp are in the column Lmci 1 and the upper bound is in the column Umci 1. The 95% confidence interval estimate for the average sbp of all people 65 years old is (151.09234, 172.55024) (see row number 5).

Figure 5.6: Prediction Intervals

8. The lower bounds of the individual prediction intervals are in the column Lici_1 and the upper

bounds are in the column Uici_1. The 95% prediction interval for the sbp of a particular 65 year old is (124.76836, 198.87422) (see row number 5).

Figure 5.5: Confidence Intervals for the Slope and the Intercept Parameters

91

Note: If you wish to obtain an individual prediction interval for a particular individual with a sbp of interest that does not appear in the data values, type that number in the cell below the bottom entry in the sbp column in the data file. Then run the commands above again. The upper and lower bounds of an individual prediction interval for the sbp of interest will appear in the data file.

5.2 Residual Analysis Our model makes the following assumptions about the error terms εi of the model. Linearity: The εi have mean of 0 Independence: The εi are independent Normality: The εi are normally distributed Homogeneity of variances: The εi have the same variance σ2

Some properties of residuals: Since the error terms εi are unknown, we use the residuals, ei = yi - ,

as estimates of the error terms.

E(ei) = = 0

The ei are functions of the xi, and as such, have different variances. In fact, the further away a xi is from

xbar, the smaller the V(ei) is.

V(ei) = V =

where Sxx = ∑

We write V = σ2( 1 - hii) where the hii are

The

√ are approximately N(0,1) (note: this requires normality of the model errors εi). They are

not independent; however, as long as the hii are fairly close to zero, they can be considered to be independent. We can substitute s2 = MSE = for σ2 , and create studentized residuals:

si =

√ where the hii can be viewed as “leverage” values that help indicate how far an xi

value lies from the mean . They can be shown to be functions of only the xi in the model, and they are such that 0 <= hii <=1. Observation: If the εi have the same variance σ2, then the studentized residuals have a Student’s t distribution with n – 2 degrees of freedom. This, of course, is rather close to a N(0,1) distribution for large enough n. Further details about the above material about residuals can be Statistics texts such as “Probability and Statistics for Engineering and the Sciences, Eighth Edition”, by Jay L. Devore, Brooks/Cole, 2012.

Graphical techniques are used in the residual assessment. A probability plot of residuals and/or a histogram can check for normality.

92

Plotting residuals versus the independent variable, and/or plotting residuals versus the predicted values allows us to check for whether the errors are centered at 0, and have equal variances. If these assumptions are met, then a plot of the residuals should show a randomized pattern, and they should appear in a horizontal band centered around 0. If this is not the case, then one of these assumptions is not being met. Plotting residuals versus the independent variable when the independent variable has a natural ordering to it (such as time) allows us to make sure that no patterns of dependency exist in such a case. These plots work quite well where there is only one independent variable.

Patterns to watch out are included below in Figure 5.6.

a b c d

Figure 5.6 Patterns of Note in plots of Residuals versus Predicted Values Pattern a) above is satisfactory; a horizontal band centered on 0 appears. Patterns b) and c) can indicate that the assumption of equal variances is doubtful (with variance increasing over time or x or predicted values in b) and variance uequal over time with time or x or predicted values in c). Pattern d) can indicate a non-linear relationship and that a transformation on the x variables may be needed to bring linearity to a model. Since the size of the residuals will depend on the particular problem at hand, it often facilitates residual analysis to standardize or studentize the residuals.

Standardized residuals may be found by dividing the residuals ei = yi - by the estimate (s = √ )

of σ. Furthermore, since the variance of residuals decreases as the independent variable xi values move further from their mean, many texts (and your instructor) suggest that the use of Studentized residuals in residual analysis is perhaps more sensible. Studentized residuals are always larger than standardized residuals because the hii are always between 0 and 1, making studentized residuals more sensitive to outlying values. SPSS has a feature that allows calculation of unstandardized, standardized and studentized residuals. To save the residuals and the predicted values for a residual analysis, from the regression window:

(a) Select Save... (b) Check Unstandardized, Standardized and Studentized in the Residual box (c) Check Unstandardized in the Predicted Values box (d) Click Continue (e) Click OK

93

The unstandardized residuals, standardized residuals, studentized residuals, and unstandardized predicted values are now saved as variables RES_1, ZRE_1, SRE_1, and PRE_1 in the SPSS worksheet. (We note that standardized predicted values can also be saved. We note that it is easier to identify the mean of standardized predicted values than it is to identify the mean of unstandardized predictors when looking at the axis of a graph. We will use unstandardized predictors in our exploration of model fitness, mainly to facilitate our looking up predicted values of interest.) To check if it is reasonable to assume that the error is normally distributed obtain a normal Q-Q plot and a histogram for each of RES_1, ZRE_1 and SRE_1. The commands below indicate how to do this for RES_1. Analogous commands would do this for ZRE_1 and SRE_1.

(a) Click Analyze>Descriptive Statistics...>Q-Q Plots... (b) Choose RES_1 (or Unstandardized Residuals) as variable (c) Make sure the Test Distribution is Normal (d) Click OK (d) Click Graphs>Legacy Dialogs>Histogram (e) Choose RES_1 (or Unstandardized Residuals) as variable (f) Click OK

Note that in practise, one would choose one variable (likely studentized residuals) for the y axis and one variable (likely standardized predicted values) for the x axis. All graphs are included so that students can see their similarity. However, students are encouraged to note that the graphs with the studentized residuals do highlight points that have more of an influence on the regression. The outcome appears in Figures 5.7 and 5.8.

Figure 5.7: Q-Q Plot of the Standardized Residuals

Note that scaling the data gives us a better perspective on the “closeness” of the points to the line.

Figure 5.8: Histogram of the Unstandardized, Standardized, and Studentized Residuals

94

With the exception of an outlier, the histogram of residuals is fairly normal and centered at 0. The QQ- plot likewise indicates a fairly normal distribution for the error, with the exception of the one outlier. A QQ plot is often preferable to a histogram when the number of points is small, although in this case n=30, which is sometimes viewed as somewhat large in statistical applications. The one outlier is responsible for the small correlation co-efficient. It pulls the regression line up above the scatter of the other points so that the points are not as close to the regression line as they might be otherwise. To check if it is reasonable to assume that the errors have a mean of 0 and have constant variance, we can plot residuals versus the independent variables, and/or plotting the residuals versus the predicted values. Commands are provided below to plot residuals against unstandardized predicted values, RES_1. Analogous commands apply for ZRE_1 and SRE_1. Output is presented in Figures 5.9 and 5.10.

a) Click Graphs>Legacy Dialogs>Scatter/Dot... (b) Click the Simple Scatter icon and click Define (c) Choose RES _1 (Standardized Residuals) for the Y axis (d) Choose PRE_1 (Unstandardized Predicted Values) for the X axis (e) Click OK

Figure 5.9: Scatterplot of the Unstandardized, Standardized Residuals and Studentized Residuals versus the Unstandardized Predicted Value

With the exception of the readily identified outlying value, all plots of the residuals show what appear to be random points centred in a symmetric band of constant width around 0. The assumptions of linearity and constant variance appear to be met. A scatter plot that overlays the standardized and studentized residuals against the predicted values (see Figure 5.10) shows that the studentized residuals and standardized residuals are quite close, but does point out that the studentized approach views the identified outlier as even (slightly) more problematic.

Figure 5.10: Comparison plot of Standardized and Studentized residuals against Predicted values

95

Chapter 6 Multiple Linear Regression Objectives: After studying this chapter you should be able to

1. Create a matrix plot 2. Fit a multiple regression model with SPSS 3. Conduct statistical inference concerning the regression coefficients 4. Use the multiple linear regression model for estimation and prediction 5. Analyze the residuals 6. Define multiple linear regression models with dummy variables 7. Apply variable selection techniques

6.1 The Multiple Regression Model In the multiple linear regression setting, the response variable y depends not on just one, but k explanatory variables x1, x2, ... , xk. The mean response is a linear combination of the explanatory variables: µy = β0 + β1x1 + ... + βkxk This expression is the population regression equation. We cannot directly observe this equation because the observed values of y vary about their means. We can think of subpopulations of responses, each corresponding to a particular set of values for all of the explanatory variables x1, x2, ... , xk. In each subpopulation, y varies normally with mean given by the population regression equation. The regression model assumes that the standard deviation σ of the responses is the same in all subpopulations. The multiple linear regression model can then be written as y = β0 + β1x1 + ... + βkxk + ε, ε ~ N(0, σ2) The model parameters are: β0, β1, ..., βk and σ. The random errors corresponding to the subpopulations of responses are assumed uncorrelated. It is of interest to fit a line of “best” fit to n observed data points (yi, x1i, x2i, …xki) i= 1, …, n.

Some calculus will provide us with , , . . . , the least squares estimates of β0, β1 , … βk . The multiple regression line (based on the observed values) can be written as:

where the is the predicted value for a given value x of the explanatory variable. Example: Fuel consumption in heating a home is a function of other variables such as outdoor air temperature (x1) and wind velocity (x2). For illustrative purposes, suppose the data in the SPSS file fuel.sav were collected to investigate how, for a sample of 10 winter days, the amount of fuel required to heat a home depends upon the outdoor temperature and wind velocity.

96

1. How strongly are the explanatory variables related to the response? Use a matrix plot (multiple

scatter plots) and the correlation matrix for the data set to examine the pairwise relationships among the three variables.

2. Obtain the estimated regression equation for predicting fuel consumption from the two other variables. Interpret the coefficients. Report the standard error of the estimate (σ)

3. Test at a 1% significance whether or not the model is useful for predicting the mean fuel consumption.

4. Test at a significance level of 1% whether or not temperature is linearly related to fuel consumption.

5. Check the model assumptions using the appropriate graphical techniques and tests. 6. Check if there are any outliers or influential observations.

Solution:

1. To create a matrix plot of fuel consumption, temperature, and wind velocity using SPSS:

(a) Choose Graphs>Legacy Dialogs>Scatter/Dot (b) Click the Matrix icon and then select Define (c) Select fuelc, temp, and wind for the Matrix variables box (d) Click Titles, type Matrix Plot of Fuel Consumption, Temperature and Wind Velocity then

click Continue (e) Click OK

In Figure 6.1, we see scatterplots relating each pair of these three variables. In the first row, both of the graphs have fuelc on the vertical axis: in the first column, fuelc forms the horizontal axis.

Figure 6.1: Matrix Plot of Fuel Consumption, Temperature, and Wind Velocity

97

To compute the correlation coefficients for each pair, do the following:

(a) Choose Analyze>Correlate>Bivariate (see Figure 6.2) (b) Move fuelc, temp, and wind into the Variables box (c) Click OK

Figure 6.2: Correlation Dialog Box

In the SPSS Viewer window you will find the correlation matrix (see Figure 6.3), which reports the correlation between each pairing of the three variables: the correlation coefficient for a pair of variables appears at the intersection of the corresponding row and column. For example, fuel consumption and temperature have a significant negative correlation of -0.879, while fuel consumption and wind velocity have a non-significant positive correlation of only 0.424.

We expect fuel consumption to increase as the wind velocity increases, and to decrease as the temperature increases. The matrix plot and the correlation matrix confirm our expectation.

Figure 6.3: Correlation Matrix of Fuel Consumption, Temperature, and Wind Variables

Correlations

fuelc temp wind

fuelc Pearson Correlation 1 -.879** .424

Sig. (2-tailed) .001 .222

N 10 10 10

temp Pearson Correlation -.879** 1 -.071

Sig. (2-tailed) .001 .846

N 10 10 10

wind Pearson Correlation .424 -.071 1

Sig. (2-tailed) .222 .846

N 10 10 10

**. Correlation is significant at the 0.01 level (2-tailed).

98

2. To obtain the equation of the multiple regression model of fuel consumption on temperature and wind velocity using SPSS, enter the following commands (see Figure 6.4):

(a) Choose Analyze>Regression>Linear (b) Select fuelc for the Dependent box (c) Select temp, wind for the Independent(s) box (d) Click Statistics, check Estimates from the Regression Coefficients box and Model fit then

click Continue (e) Click Options, check Include constant in equation then click Continue (f) Click OK

Figure 6.4: Regression Dialog Box

The results in Figure 6.5 are displayed in the SPSS Viewer window. We can see that we have one intercept and two slopes, one for each of the two explanatory variables. The estimated regression equation is:

= 11.928 – 0.628 x temp + 0.130 xwind The intercept (11.928) represents the fuel consumption when the temperature is zero degrees and the wind velocity equals zero. Each slope represents the mean change in fuel consumption associated with a one-unit increase in the corresponding explanatory variable, if the other explanatory variable remains unchanged. For example, if temperature were to increase by one degree, and wind velocity were to remain constant, then fuel consumption would decrease in average by 0.628 units. In the model summary output(see Figure 6.6) the standard error of the estimate (σ) is equal to 1.22492.

99

Coefficientsa

Model Unstandardized Coefficients Standardized Coefficients

t Sig. B Std. Error Beta

1 (Constant) 11.928 .932 12.793 .000

temp -.628 .086 -.853 -7.275 .000

wind .130 .042 .364 3.102 .017

a. Dependent Variable: fuelc

Figure 6.5: Regression Coefficients

Figure 6.6: ANOVA Table

3. The F statistic given in the ANOVA table is used to test the overall significance of the regression

model. The hypotheses are: (a) H0: β1 = β2 = 0 (b) Ha: At least one of the βi ≠ 0, i = 1,2

From the SPSS output in Figure 6.6 the test statistics F = 33.036 with df1 = 2 and df2 = 7. Since the P-value < 0.001 is smaller than 0.05, the null hypothesis is rejected, and we conclude at the 5% significance level that the multiple regression model fits the data, and not all slope coefficients are zero.

4. The t-statistics (in Figure 6.5) are used for testing the hypothesis: The null hypothesis: H0 : β1 = 0 The alternative hypothesis: Ha: β1 ≠ 0. The t-statistic=-7.275 with df = dfE = 7 and the P-value < 0.001 in the second row of the table tell us to reject the null hypothesis. We conclude that temperature has a significant linear relationship with fuel consumption.

In multiple regression, we again can use the residuals, ei = yi - , as estimates of the error terms.

As before, we can create studentized residuals:

si =

√ where the hii can be viewed as “leverage” values that help indicate when a point in

x-space is from remote from the rest of the data (that is, it can be viewed as a measure of the distance of the point (xi1,…, xin) from the average of all the points in the data set.) Also, as before, the hii are Functions of only the xi in the model, and they are such that 0 <= hii <=1. The hii are not as readily calculated in multiple regression situations, but fortunately, SPSS will do all the calculation for us.

100

5. Commands to save residuals to use in residual analysis follow..

(a) Select Save... (b) Check unstandardized, standardized and studentized in the Residual box (c) Check unstandardized and standardized in the Predicted Values box (d) Click Continue (e) Click OK

In order to check the assumption of the error being normally distributed we should do a Q-Q plot (see Figure 6.7) and a histogram (see Figure 6.8) for the residuals. Commands for unstandardized residuals are given below. Commands for standardized and studentized residuals would be analogous.

(a) Click Analyze>Descriptive Statistics...>Q-Q Plots.. (b) Choose RES_1 (Standardized Residuals) as the variable (c) Click OK (d) Click Graphs>Legacy Dialogs>Histogram.. (e) Choose RES_1 (Standardized Residuals) as the variable (f) Click OK

Figure 6.7: Q-Q Plots

Figure 6.8: Histograms The Q-Q plots displays a slight sigmoid shape indicating that the assumption of normal error might be violated. The studentized residuals tend to be further away from the line than the unstandardized or standardized ones, and given that the studentized residuals have an “edge” when it comes to discerning points to watch, this is something to note. A larger sample would be helpful.

101

The histograms of the residuals are centered at about 0, but do not resemble a bell curve. There are really too few points here to get a good idea of what is going on. The histograms, at least, do not seem to identify any outlying points, but could perhaps indicate less probability in the middle of the distribution and more probability in the tails that we would see in a normal distribution.

In order to check if the assumptions of linearity and homogeneity of variance are correct, scatter plots of residuals against the predicted values are of interest, as are scatterplots of residuals against the individual independent variables. Commands to obtain a scatterplot for RES_1 and PRE_1 follow. Other graphs would be obtained with analogous commands.

(a) Click Graphs>Legacy Dialogs>Scatter/Dot. (b) Click the Simple Scatter icon and click Define (c) Choose RES_1 for the Y axis (d) Choose PRE_1 for the X axis (e) Click OK

Note that in practise, one would choose one variable (likely studentized residuals) for the y axis and one variable (likely standardized predicted values) for the x axis. All graphs are included so that students can see their similarity. However, students are encouraged to note that the graphs with the studentized residuals do highlight points that have more of an influence on the regression.

Figure 6.9: Scatterplots of Residuals versus Predicted Values

102

Figure 6.10a: Scatterplot of Residuals versus Temp

Figure 6.10b: Scatterplot of Residuals versus Wind

As with SLR, it is of interest to plot the residuals against the predictors (Figure 6.9). One would expect a constant band to appear symmetrically scattered around 0 if the assumptions of linearity and constant variance are met. However, with multiple regression, multi-collinearity (correlations among the independent variables) can become a problem, and it can be difficult to interpret these graphs. They may, however, identify points of interest that may be outlying. In addition, plotting the residuals against each independent variable can help to discern if problems exist, but even then, these issues are not readily identified with such graphs.

No curvature appears in any of the plots and the data is randomly and somewhat symmetrically scattered in a constant band about 0; the assumptions of linearity (proper independent variables included in the model) and constant variance appear to be met.

With only ten data points, it is very difficult to investigate the residuals. The data on the graphs of the residuals versus the predicted values falls (roughly) in a horizontal band. None of these graphs show any outlying (wind, temp) point strongly affecting the predicted value. However, Figures 6.10a and 6.10b graphs do highlight, respectively, that one point has a temp value that is much smaller than all the other temp values and one wind value has a value that is much higher than all the other wind values. A look in the data file can identify these as observation 7 ( (temp value, wind value) = (-15.50. 5.90) )and observation 3 ( temp value =(-10.00, 41.20) ). What we see here is two lower temperatures (relative to the data in the data set), but in one case the wind is low and in the other case the wind is high! It is likely that with a larger data set, we would get a better picture here.

103

6. To identify influential observations, the following two additional statistics can be calculated: leverage value, and Cook’s Distance.

(a) Leverage value (hii) for Finding Cases with Unusual Explanatory Variable Values. If hii > 2p/n (=6/10=0.6 in our case), then observation i has a high potential for influence, where p is the number of regression coefficients and n is the number of data in the study.

(b) Cook’s Distance (Di) for Finding Influential Cases: This is a measure that considers the

squared distance between the usual least squared estimate (uses the s) based on all n observations and the estimate obtained when the ith point is deleted. If a point is influential, its removal would result in some of the regression coefficients changing considerably. If Di is close to or larger than 1 then case i may be considered influential. To calculate these statistics in SPSS from the regression window: (a) Select Save (b) Check Leverage values and Cook’s under Distances (c) Click Continue (d) Click OK

We create scatterplots to graph Cook’s Distances and Leverage values against Fuelc in order to discern if there are any “outlying” values of interest (Figures 6. 11a and 6.11b)

Figure 6.11a: Leverage versus Fuelc

Figure 6.11b: Cook’s Distance versus Fuelc

One quick way to identify which case(s) are having more influence is to click on a graph to bring up the chart editor and then click on a point to bring up the properties window. Then on the Variables tab, under Case number, drop down and select X Axis. This will re-label the X axis with case numbers. You can then look up the point in the original data file. See Figures 6.12a and 6.12b.

104

Figure 6.12a: Leverage versus Fuel

BY Case Number

Figure 6.12b: Cook’s Distance versus Fuel BY Case Number

In Figure 6.12a we see that there are 2 observations – observation 7 (fuelc = 21.83) and observation 3 (fuelc = 23.76) – with leverages greater than 0.6. Note that these findings match our previous observations when we looked at the graphs of residuals versus the independent variables above. Recall that observations 7 and 3 have colder values of temperature than the other observations, but in one case wind is very low and in the other case wind is very high. In Figure 6.12b we have only one observation (observation 7 on the worksheet corresponding to fuelc = 21.83) with a Cook’s distance greater than 1 and substantially larger than the rest. Note that Cook’s distance measure does not identify case number 3 as influential.

This suggests we look further at observations 7 and 3 as potentially influential observations. Sometimes, we can decide to eliminate “outlying” observations. In what follows, we eliminate the observation number 7 and then estimate the regression coefficients using the remaining 9 observations. Note: you want to be very careful if you eliminate observations as you will want to be able to justify your decision – evaluating the data for measurement anomalies is useful. Comparing the results displayed in Figure 6.13 and 6.5, a striking consequence of the exclusion of case 7 is the drop in significance of the two intercepts (from two sided P-values of <0.001 and 0.017, to two-sided P-values of 0.003 and 0.172 respectively). Now wind does not have a significant influence on fuel consumption (holding temp constant).

Figure 6.13: Estimated Regression without Case 7

105

Finally, out of curiosity, let us see what happens if we choose to eliminate both observation 3 and observation 7. See Figure 6.14.

Coefficients

a

Model

Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta

1 (Constant) 12.966 1.806 7.180 .001

temp -.753 .169 -.884 -4.442 .007

wind .038 .110 .068 .343 .745

a. Dependent Variable: fuelc

Figure 6.14: Estimated Regression without Cases 7 and 3

Elimination of observation 3 and observation 7 further increases the drop in significance of the two intercepts (from two sided P-values of <0.003 and 0.172, to two-sided P-values of 0.007 and 0.745 respectively). This model continues to find that wind has no significance on fuel consumption (holding temp constant). Note that the 8 remaining (xi, yi) are much closer to their respective mean ( . We note that a larger sample would have been very useful here. Intuitively, the independent factor that would have the biggest impact on fuel consumption would be temperature. And it may be that a simple linear regression with just this one independent variable of temperature may fit well enough to allow for prediction, without needing or wanting to consider other factors. On the other hand, we may want to consider a model that allowed for temperature with a different predictor rather than wind, such as how well insulated a house is.

6.2 Dummy Variables in Regression Analysis Dummy variables, which are also called indicator variables, are used to employ categorical predictor variables in a regression analysis. By using dummy variables, we can broaden the application of regression analysis. As you saw before, regression analysis is used to analyze the relationship of quantitative variables. Through the introduction of dummy variables we now can include categorical variables. In particular, dummy variables allow us to employ regression to produce the same information obtained by analytical procedures, such as analysis of variance and analysis of covariance. In this section we focus on one important application of dummy variables: comparing several regression equations by using a single multiple regression model. Example: Suppose a random sample of data was collected on residential sales in a large city. The following data shows the sales price y (in $1000s), living area x1 (in hundreds of square feet), number of bedrooms x2, total number of rooms x3, age x4, and location of each house (dummy variables z1 and z2 defined as follows: z1 = z2 = 0 for intown, z1 = 1, z2 = 0 for inner suburbs, z1 = 0, z2 = 1 for outer suburbs). The data are available in the SPSS file house.sav.

1. Identify a single regression model that uses the data for all three locations, and that defines straight-line models relating sale price (y) and area (x1) for each location.

2. Determine the fitted straight line for each location. 3. Test whether or not the straight-lines for the three locations coincide. 4. Test whether or not the lines are parallel.

106

5. In the light of your answers to part (3) and (4), comment on the differences and similarities in the sale price-area relationship for the three locations.

Solution:

1. The multiple regression model can be written as: y = β0 + β1x1 + β2z1 + β3z2 + β4 x1z1 + β5x1z2 + ε

For each location, the simple regression equations are: For intown: y = β0 + β1x1 + ε For inner suburb: y = β0 + β2 + ( β1 + β4 )x1 + ε For outer suburb: y = β0 + β3 + (β1 + β5 )x1 + ε

To estimate the coefficients, we select Transform>Compute and we add two more columns corresponding to x1z1 and x1z2. Then we select Analyze>Regression>Linear with y as the dependent variable and x1, z1, z2, x1z1, x1z2 as the independent variables. We get the results shown in 6.15

Figure 6.15: SPSS Output 2. From the SPSS output, the estimated linear regression equation for each location can be written

as: For intown:

For inner suburb: For outer suburb:

3. For each location, the simple regression is as in point 1 above. All three lines will coincide if all slopes and intercepts are the same. Thus, we must have:

β0 = β0 + β2 = β0 + β3

107

β1 = β1 + β4 = β1 + β5 This is only the case if β2 = β3 = β4 = β5 = 0 To determine this we apply a partial F test:

Reduced Model: y = β0 + β1x1 + ε Full Model: y = β0 + β1x1 + β2z1 + β3z2 + β4 x1z1 + β5x1z2 + ε

The hypotheses are: Ho: β2 = β3 = β4 = β5 = 0, Ha: At least one of the βi ≠ 0, i = 2, ..., 5. For the reduced model we have the ANOVA table in Figure 6.16.

Figure 6.16: ANOVA table for the Coincident Regression Lines

From the ANOVA tables in Figures 6.15 and 6.16 we get the test statistics

F =

=

= 13.44872,

which has an F(4, 24) distribution. Since the P-value < 0.001, we have enough evidence against H0, and we conclude that the lines are not coincident – i.e. the three lines are not all the same.

4. All three lines will be parallel if all the slopes are the same. Thus, we would have:

β1 = β1 + β4 = β1 + β5 or β4 = β5 = 0

Again we apply a partial F test: Reduced Model: y = β0 + β1x1 + β2z1 + β3z2 + ε

Full Model: y = β0 + β1x1 + β2z1 + β3z2 + β4 x1z1 + β5x1z2 + ε The hypotheses are:

H0: β4 = β5 = 0, Ha: At least one of the βi ≠ 0, i = 4, 5. For the reduced model we have the ANOVA table in Figure 6.17.

108

Figure 6.17: ANOVA Table for the Parallel Regression Lines

From the ANOVA tables in Figures 6.15 and 6.17 we get the test statistics

F =

=

= 16.70454,

which has an F(2, 24) distribution. Since the P-value < 0.001, we have enough evidence against H0 and

we conclude that the lines are not parallel.

5. In town has a much lower baseline price relative to the suburbs, but intown price increases faster than in the suburbs as the house size increases. Both suburb areas are similar in the

baseline sale price and the increase in the price as house size increases.

6.3 Selecting the Best Regression Equation The purpose of model selection is

1. To identify important predictor variables ( variable screening) for the prediction of the response variable;

2. To improve the accuracy of prediction; 3. To simplify the prediction equation. Several changes occur as we include more predictor variables in a regression. 1. Prediction improves. R2 - but not necessarily R2 adjusted - increases, and se, the residual standard

deviation, decreases. Is this improvement substantial? 2. Coefficients describe how the additional variables affect . Are these coefficients significantly

different from zero and large enough for substantial importance? 3. Spurious coefficients may shrink. Do the added variables substantially alter our conclusions

regarding the effects of other predictor variables? Affirmative answers to any of these questions support keeping the added variable(s) in the model. Negative answers indicate that the variables contribute little and should be left out unless theoretically important.

109

When practical, all-possible regressions have to be considered, and the model having the largest R2, the smallest MSE, and so on, has to be chosen. When the number of variables in the maximum model is large, the amount of calculation necessary becomes impractical. There are several selection procedures that examine a reduced number of subsets among which a good model is found. The search strategy for selecting variables is concerned with determining how many variables and also which particular variables should be in the final model. Here are some important procedures:

1. Forward selection procedure

2. Backward selection procedure

3. Stepwise procedure

Forward Selection Procedure

1. Start by fitting all possible one-variable models of the form- µy = β0 + β1x1 - to the data. For each model, conduct a t-test for a single β parameter with hypotheses H0: β1 = 0 versus Ha: β1 ≠ 0. The independent variable that produces the smallest P-value (or largest t-value) is declared the best one-variable predictor of y, and enters the model, provided that its P-value does not exceed the specified α constant.

2. At each step, add one variable having the smallest P-value (or largest t-value). 3. Stop adding variables when a stopping rule is satisfied (stop when all variables in the model have

a P-value larger than a certain number). 4. The model used is the one containing all predictors that were added.

Backward Elimination Procedure

1. Start with the full model. 2. At each step, remove one variable having the smallest t-value or the largest P-value (least

significant variable). 3. Stop removing when a stopping rule is satisfied (stop when all the variables in the model has P-

value smaller than a certain number). 4. The model used is the one containing all predictors that were not eliminated.

Stepwise Forward-Backward Procedure

This is a modified version of the forward procedure that permits re-examination, at every step, of the variables incorporated in the model in previous steps. A variable that entered at an early stage may become superfluous at a later stage because of its relationship with other variables subsequently added to the model. To check this possibility, at each step we do a partial F-test for each variable currently in the model, as though it were the most recent variables entered, irrespective of its actual entry point into the model. The variable with the smallest non-significant partial F statistic (if there is such a variable) is removed; the model is refitted with the remaining variables; the partial F's are obtained and similarly examined, and so on. The whole process continues until no more variables can be entered or removed. One should not overly rely on stepwise regression, interpret results carefully, and perform residual or diagnostic analysis as described before.

110

Example: A random sample of data was collected on residential sales in a large city: the selling price y in $1000s, the area x1 in hundreds of square feet, the number of bedrooms x2, the total number of rooms x3, the house age x4 in years and the location z = 0 for in-town and inner suburbs, z = 1 for outer suburbs. Use the variables x1, x2, x3, x4 and z as the predictor variables. The data are available in the SPSS file house2.sav.

1. Use the forward procedure to suggest a best model 2. Use the backward elimination procedure to suggest a best model 3. Use the stepwise procedure to suggest a best model. 4. Which of the models previously selected seems to be the best model, and why?

Solution:

1. The SPSS commands for forward selection procedure are:

(a) Select Analyze>Regression>Linear (b) Select y as the Dependent variable and x1, x2, x3, x4 and z as Independent variable(s) (c) Choose Forward as Method (d) Click Statistics , check R squared change (e) Click Options, as Use probability of F, type 0.25 in the Entry box and 1 in the Removal

box (f) Click OK

The SPSS output contains 5 tables. The Variables entered/removed table shows the variables added and the order in which the variables are added. The final model contains the predictors x3, x1, x4 and x2 (see Figure 6.18). The Model summary table contains the values of R, R-squared, adjusted R-squared, standard error of the estimate, R-squared changes and their significance. The final model has the best values for R, R-squared , R-squared adjusted and standard error of the estimate, but compared with the previous model with predictors x3, x1 and x4 there is not a large improvement (see Figure 6.18).

111

Figure 6.18: The Coefficient Table for the Forward Procedure

The Coefficient tables give the estimated coefficients for each model and the t and P-values for each coefficient, in each model (see Figure 6.18). The ANOVA table provides the results of the F-tests for each model (see Figure 6.19). The entry value of 0.25 is chosen for forward selection to allow the procedure to continue through most of its subsets sizes. By inspecting the P-values we can notice that the selection would have been stopped at the two variable model x3, x1 had the entry value been chosen 0.05.

Figure 6.19: ANOVA Table for the Forward Procedure

112

Figure 6.20: The Excluded Variables Table for the Forward Procedure

The Excluded variables table summarizes for each model the variables that have not yet been considered (see Figure 6.20).

2. The SPSS commands for backward selection procedure are:

(a) Select Analyze>Regression>Linear (b) Select y as the Dependent variable and x1, x2, x3, x4 and z as Independent variable(s) (c) Choose Backward as Method (d) Click Statistics , check R squared change (e) Click Options, as Use probability of F, type 0.05 in the Entry box and 0.1 in the Removal

box (f) Click OK

Figure 6.21: Partial Output of Backward Procedure

113

The Variables entered/removed table shows that the final model contains the predictors x3 and x1 (see Figure 6.21). The last model doesn't have the best R-square or R-square adjusted (see the table it Model Summary in Figure 6.21). The coefficients, the t and P-values are given in the Coefficients table displayed in Figure 6.22. We can notice that the P-values (significance) for the coefficients corresponding to x3 and x1 are 0.000 < 0.1 and 0.005 < 0.1 respectively.

Figure 6.22: The Coefficients Table for the Backward Procedure

3. The commands SPSS for stepwise selection procedure are:

(a) Select Analyze>Regression>Linear

(b) Select y as the Dependent variable and x1, x2, x3, x4 and z as Independent variable(s) (c) Choose Stepwise as Method (d) Click Statistics , check R squared change (e) Click Options, as Use probability of F, type 0.149 in the Entry box and 0.15 in the

Removal box (f) Click OK

The Variables entered/removed table shows that the final model contains the predictors x3, x1, and x4 (see Figure 6.23). The last model has the best R-square and R-square adjusted (see the table Model Summary in Figure 6.23). 4. Comparing the Model Summary tables for the models found using the forward, backwards and

stepwise procedures, we can see that the model found with the stepwise procedure and containing variables x3, x1, and x4 is a simple model that has a high R-squared and R-squared adjusted values, and the smallest standard error. This appears to be the best model. The model found using the forward procedure is larger, but the improvement of the R-squared and R-squared adjusted values after adding an extra predictor x2 is not significant.

114

Figure 6.23: Partial Output of Stepwise Procedure

statistics 252 · 2017. 7. 26. · the spss icon allows resizing, minimizing, closing, etc. of...

Documents