data analysis guidelines - university of mary … · web viewbasic data analysis guidelines for...

34
Basic Data Analysis Guidelines for Research Students Isaac V. Gusukuma, Ph.D., LMSW-IPR, ACSW University of Mary Hardin-Baylor Social Work Program January 30, 2012 Reproduction of any part of the guidelines is not permitted without the author’s permission. August, 2008

Upload: others

Post on 10-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Analysis Guidelines - University of Mary … · Web viewBasic Data Analysis Guidelines for Research Students Isaac V. Gusukuma, Ph.D., LMSW-IPR, ACSW University of Mary Hardin-Baylor

Basic Data Analysis Guidelines for Research Students

Isaac V. Gusukuma, Ph.D., LMSW-IPR, ACSW

University of Mary Hardin-BaylorSocial Work Program

January 30, 2012

Reproduction of any part of the guidelines is not permitted without the author’s permission.August, 2008

Page 2: Data Analysis Guidelines - University of Mary … · Web viewBasic Data Analysis Guidelines for Research Students Isaac V. Gusukuma, Ph.D., LMSW-IPR, ACSW University of Mary Hardin-Baylor

2

Table of Contents Page

Introduction............................................................................................................................4

Organization of the Guide................................................................................................4

Basic Guidelines for Constructing a Survey Question..........................................................5

Constructing Your Response Categories - Establishing Your Level of Measurement....5

Associating Response Categories of a Question to Statistical Procedures......................6

Basic Guidelines for Analyzing Data....................................................................................7

Data Analysis: Making Sense of Those Numbers.................................................................8

Check To Be Sure Your Data is Accurate.......................................................................8

Conducting a Frequencies Analysis for Each Variable...................................................9

Example of a Survey Question and SPSS Frequencies Output for the Variable SEX....9

Univariate Data Analysis.......................................................................................................10

Analysis of a Nominal Level Variable.............................................................................10

Example of a survey question and SPSS output for a nominal level variable...........10

Analysis of an Ordinal Level Variable............................................................................12

Example of a survey question and SPSS output for an ordinal level variable...........12

Analysis of an Interval/Ratio Level Variable..................................................................13

Example of a survey question and SPSS output for an interval level variable..........14

Bivariate (2 variables) Data Analysis....................................................................................15

Chi Square (Goodness of Fit) Test...................................................................................15

Example 1 - Chi square test.......................................................................................16

Example 2 - Chi square test.......................................................................................17

t-Test (Difference of Means Test)....................................................................................

Example 1 - One sample t-test...................................................................................

Example 2 - Independent samples t-test....................................................................

Example 3 - Paired samples t-test..............................................................................

Analysis of Variance (ANOVA) Test..............................................................................

Example of a one-way ANOVA................................................................................

Pearson’s Product Moment Correlation (r)......................................................................

Example - Pearson’s (r).............................................................................................

Page 3: Data Analysis Guidelines - University of Mary … · Web viewBasic Data Analysis Guidelines for Research Students Isaac V. Gusukuma, Ph.D., LMSW-IPR, ACSW University of Mary Hardin-Baylor

3

Conclusion.............................................................................................................................

Appendices SPSS Output Screens

Appendix 1 Frequencies SPSS Screens...........................................................................19

Appendix 2 Crosstab and Chi Square SPSS Screens.......................................................21

Appendix 2 t-Test SPSS Screens.....................................................................................

One Sample t-Test Screens........................................................................................

Independent Samples t-Test Screens.........................................................................

Paired Samples t-Test Screens...................................................................................

Appendix 3 Analysis of Variance (one-way) SPSS Screens...........................................

Appendix 4 Pearson’s r SPSS Screens............................................................................

References..............................................................................................................................23

Page 4: Data Analysis Guidelines - University of Mary … · Web viewBasic Data Analysis Guidelines for Research Students Isaac V. Gusukuma, Ph.D., LMSW-IPR, ACSW University of Mary Hardin-Baylor

4

Basic Data Analysis Guidelines for Research StudentsIntroduction

Research and statistics are inseparable. Knowing this is one thing. Understanding and using

this relationship is another, especially for a research student. An oversight of many research

students is that of waiting until later rather than considering early in the research process the

relationship between the problem statement, research question, hypotheses, the kinds of data one

will be collecting, and the statistical analysis of the data.

This basic guide for analyzing data is presented to encourage you to consider early rather

than later in the research process the relationship that exists between questions asked on a

survey, the response categories and data that is generated, and statistical procedures available to

create some sense from the collected data. Thinking about data and its analyses should be part of

the first steps in the development of a research proposal and like many other parts of the research

process should be continually revisited, updated, and refined as your project draws to a

conclusion.

This guide provides examples of univariate (single variable) and bivariate (two variables)

analysis. It begins by encouraging you to be certain that your data set is accurate and “error

free,” then proceeds to discuss several basic univariate and bivariate data analysis procedures.

Univariate procedures are essentially what you already know as descriptive statistics. Bivariate

statistical procedures presented in this guide include: the chi square test, the t-test, analysis of

variance (ANOVA), and the Pearson’s r (correlation). This guide does not discuss multivariate

(more than two variables) statistical analysis procedures.

Organization of the Guide

This guide begins with two very brief sections on constructing questions for a survey and

general reminders about data analysis. The points in these two sections should serve as “memory

joggers” as you begin to consider the relationship between your research design and statistical

analysis. The Data Analysis section re-introduces you to the important task of insuring your data

is “clean” by conducting a “Frequencies” procedure. Once you are fairly certain your data is

accurate, you can begin the statistical analysis procedures, initially conducting univariate data

analysis then moving on to bivariate procedures.

This guide for data analysis assumes an understanding of basic statistics and basic skills and

experience with SPSS ™.

Page 5: Data Analysis Guidelines - University of Mary … · Web viewBasic Data Analysis Guidelines for Research Students Isaac V. Gusukuma, Ph.D., LMSW-IPR, ACSW University of Mary Hardin-Baylor

5

Basic Guidelines for Constructing a Survey Question

Though this guide will not present all aspects of designing a research project, you may find it

helpful to have a few reminders about constructing questions for a survey instrument. This will

enable you to be mindful that how you ultimately construct a question and its response categories

determine what you can do, statistically, with it.

When constructing survey questions or when selecting questions to use from a standardized

instrument, you may want to keep in mind the following questions:

1. What’s the purpose of my research? Am I trying to describe, to explain, to predict, or

evaluate some occurrence and given the purpose of my research, will I need to generate

descriptive statistics, inferential statistics, or both descriptive and inferential statistics?

2. For each question on a survey instrument, does this survey question provide information

about the independent variable(s), the dependent variable, the control variables, or is this

question on the survey to provide some demographic information about the respondents?

3. Which of the variables/questions do I intend to analyze together, i.e., gender of the

respondents by their education level?

4. What is the best or most appropriate level of measurement (nominal, ordinal,

interval/ratio) for this variable? Should I create response categories so that I get nominal,

ordinal, or interval/ratio level data?

5. Will I have a random or nonrandom sample and is my sample of sufficient size that I can

assume the scores approach that of a normal distribution?

6. What is my anticipated sample size and will I have a sample of sufficient size such that I

can conduct the statistical procedures I have planned to run?

How you answer these questions will, to a degree, influence the questions you ask on your

survey and help establish the response categories for the questions. Most importantly they will

influence the kinds of statistical procedures you are able to conduct for your study.

Constructing Your Response Categories - Establishing Your Level of Measurement

If you are constructing your data collection instrument, you have the opportunity to establish

the level of measure for many of your variables. As an example, the variable education can be

constructed in such a way that your data may be a nominal, ordinal, or an interval/ratio measure.

Education as a nominal measure:

Do you have a high school diploma?

____Yes ____No

Page 6: Data Analysis Guidelines - University of Mary … · Web viewBasic Data Analysis Guidelines for Research Students Isaac V. Gusukuma, Ph.D., LMSW-IPR, ACSW University of Mary Hardin-Baylor

6

Education as an ordinal measure:

What is your current class standing?

___Senior ___Junior ___Sophomore ___Freshman

Education as an interval/ratio measure:

How many years of education do you have?

______Years

As you examine the examples above of how you could construct a question about one’s level

of education, you should recognize that designing and constructing a survey instrument is both a

science and an art, and you should think of a question in terms of its response categories and

level of measure. The next section further illustrates the importance of the response categories of

your questions.

Associating Response Categories of a Question to Statistical Procedures

This section presents the relationship between level of measure of the response categories of

a question and possible basic statistical procedures you can conduct. As noted earlier and

illustrated in sections still to come, you should think in terms of both univariate and bivariate

data analysis. The tables below provide a basic guide for the types of univariate and bivariate

data analysis you can conduct, based on the measurement level of your variables. In the tables

below, measurement level refers to the response categories for a given question on a survey.

Table 1: Univariate ProceduresMeasurement Level Basic Statistical Procedures

Nominal measuresEX: gender; ethnicity; religious preference

Mode, Percentages, Ratios

Ordinal measuresEX: socioeconomic status as high, medium, and low; class standing as Senior, Junior, Sophomore, Freshman

Mode, Median, Percentages, Ratios, Quartiles

Interval /ratio measuresEX: age in years; income in dollars; test scores

Mode, Median, Percentages, Ratios, Quartiles, Mean, standard deviation

In Table 2 Bivariate Statistical Procedures, you will notice a row and column identified as

dichotomous. Dichotomous variables are a special category of variables that only have two

meaningful response categories. Dichotomous variables, for the purpose of this guide, will be

treated as though they are nominal level variables. Examples of dichotomous variables include

Page 7: Data Analysis Guidelines - University of Mary … · Web viewBasic Data Analysis Guidelines for Research Students Isaac V. Gusukuma, Ph.D., LMSW-IPR, ACSW University of Mary Hardin-Baylor

7

Sex (Male/Female), US Citizen (Yes/No), Race (White/Nonwhite), and Religion

(Christian/NonChristian).

Table 2 provides also you with recommendations about statistical procedures you may desire

to conduct when examining two variables. Table 2 is read by looking at the intersection of the

row and column that represents the level of measure of your two variables. Thus, if you have two

interval level variables (interval x interval) you should probably conduct a Pearson’s r

(correlation).

Table 2: Bivariate Statistical ProceduresMeasurement Level

of First VariableMeasurement Level of Second Variable

Dichotomous Nominal Ordinal Interval/Ratio

Dichotomous Chi squarePhi

Nominal Chi squareCramer’s VLambda

Chi SquareCramer’s VLambda

Ordinal Chi squaret-test (for interval like data)

ANOVAOne-way (for interval-like data)

Gamma, Somers' d, Tau B, Tau C, Spearman’s rho,Pearson’s r (for interval-like data)

Interval/Ratio t-test forindependent, paired, and one-sample

ANOVAOne-way

ANOVAOne-wayPearson’s r (for interval-like data)

Pearson’s r

Basic Guidelines for Analyzing Data

Before you actually begin to conduct your data analysis, there are a few preliminary points to

consider that may impact your statistical analysis. The statements below are for you to consider

once you have collected your surveys and as you enter and begin the statistical analysis of your

data.

1. “Junk in, junk out,” meaning if your data is not entered accurately (is not “clean”), the

conclusions drawn from your statistical analysis may not be correct.

2. You are generally more likely to find statistical significance with larger samples. Thus, if

you have a small sample (exactly what “small” means will need to be covered in a

research methods course) you are less likely to find significance, which leads to the next

point.

Page 8: Data Analysis Guidelines - University of Mary … · Web viewBasic Data Analysis Guidelines for Research Students Isaac V. Gusukuma, Ph.D., LMSW-IPR, ACSW University of Mary Hardin-Baylor

8

3. While an alpha level of .05 (level of significance, = .05) is standard for most social

science research, you may decide to establish either a higher or lower alpha based on

your research design, question, and sample size. Consult with your professor or a

statistical consultant about the alpha to establish for your analysis. The important point to

remember is that you should establish your alpha before you conduct your statistical

analysis.

4. In statistical analysis a relationship is either significant or not significant. There is no

relationship that can be described as “highly significant” or “strongly significant.” If you

have established your alpha as .05, then whether the computed probability (p) is .049

or .0001, you can only state that you have a “significant” relationship.

5. Remember that a high or “strong” correlation is not the same as causation.

Data Analysis: Making Sense of Those Numbers

Check To Be Sure Your Data Is Accurate

One of the first steps in data analysis is to insure the information in your data file is accurate.

In other words you should have some level of certainty the data entered into your SPSS data file

are correct. One way to check for errors in data entry is to run the Frequencies procedure. This

will help you identify one type of data entry error, specifically when you enter a numeric value

that does not represent a response code. For example, for the variable Sex, you have the numeric

codes of 1 for “Male” respondents, 2 for “Female” respondents, or 99 representing responses that

are “Not Answered.” Upon running the Frequencies procedure you note that a 7 has been entered

for the variable. The 7 is a data entry error because you should only have codes of 1, 2, or 99 for

the variable Sex.

The Frequencies procedure, however, will only help you identify one type of data entry error.

The output from a Frequencies procedure will not identify data entry errors where, for the

variable Sex, you entered a code of 1 for a respondent when it really should have been a 2. In

other words, you miscoded the respondent as “Male” instead of “Female” but the numeric code

you entered, a code of 1, is a valid code for the variable Sex. Identifying and correcting this and

other types of data entry errors will require other procedures and processes on the part of the

researcher or person entering the data.

Page 9: Data Analysis Guidelines - University of Mary … · Web viewBasic Data Analysis Guidelines for Research Students Isaac V. Gusukuma, Ph.D., LMSW-IPR, ACSW University of Mary Hardin-Baylor

9

Conducting a Frequencies Analysis for Each Variable

Check for the following:

a. Is the total number of responses, the number of records entered, correct for each variable,

i.e., if you entered 40 records, do you have 40 in the data file for each variable - good

responses plus those you have identified as “system missing?”

b. Are all the numeric codes entered correctly, i.e., if you are only supposed to have 1’s for

Males, 2’s for Females, and 99’s for Not Answered (NA), did you check to insure you

don’t have any other numeric value entered for that variable?

c. If you note errors in the data, correct them before you conduct your statistical analysis,

then rerun “Frequencies” for those variables where corrections were made.

d. Frequencies is not appropriate for string variables that have alpha numeric characters

such as street addresses and names.

Example of a Survey Question and SPSS Frequencies Output for the Variable SEX

Example of a survey question about the respondent’s sex with pre-coded responses:

1. What is your sex?____ 1 Male ____ 2 Female

Example of SPSS Frequencies output for the variable Sex:

StatisticsRESPONDENTS SEX

N Valid 40Missing 0

RESPONDENTS SEX

Frequency PercentValid

PercentCumulative

Percent

Valid 0 1 2.5 2.5 2.51 MALE 17 42.5 42.5 45.02 FEMALE 22 55.0 55.0 100.0Total 40 100.0 100.0

Though the Frequencies procedure will not totally eliminate the problem of data entry error,

it will help reduce the error in your data. The Frequencies procedure can also generate basic

Data entry error identified by running Frequencies as there should be only 1’s and 2’s entered.

Page 10: Data Analysis Guidelines - University of Mary … · Web viewBasic Data Analysis Guidelines for Research Students Isaac V. Gusukuma, Ph.D., LMSW-IPR, ACSW University of Mary Hardin-Baylor

10

descriptive statistics that will allow you to both check your data for errors and begin to develop a

sense of the distribution of scores for your variables. The next section discusses univariate

statistical procedures that can be conducted as you are running the Frequencies procedure.

Univariate Data Analysis

Univariate data analysis is the analysis of a single variable as opposed to conducting data

analysis using two (bivariate) or more (multivariate) variables. The term “descriptive statistics”

is most often associated with summarizing the characteristics of a variable or a set of variables.

Another general term, “measures of central tendency,” is also used as a reference to the statistical

procedures associated with describing the distribution of values of the responses to a single

variable. Measures of central tendency include the mode, median, and mean. Other information

about the distribution of scores in a variable that further assist with describing the variable

include the range, upper and lower limits, variance, standard deviation, and confidence interval.

Analysis of a Nominal Level Variable

A nominal variable is a categorical variable that is measured in such a way that the categories

indicate differences among respondents with no hierarchy or rank order implied in those

differences. When constructing a survey question with nominal level response categories, the

response categories should be mutually exclusive and exhaustive. Common examples of nominal

level variables are Sex (Male/Female), Ethnic Background (Anglo, Hispanic, African American,

Asian, Pacific Islander, etc.), and Religion (Protestant, Catholic, Jewish, Islamic, Buddhist, etc.).

The following statistics may be appropriate for nominal variables/data:o Frequencies (mode)o Percentageso Ratios

Example of a survey question and SPSS output for a nominal level variable

Example of a survey question and nominal response categories with pre-coded response categories:

1. What is your religious preference?___1 Protestant ___2 Catholic ___3 Jewish ___4 None __5 Other

Example of SPSS outputs for the variable Religious Preference:

Page 11: Data Analysis Guidelines - University of Mary … · Web viewBasic Data Analysis Guidelines for Research Students Isaac V. Gusukuma, Ph.D., LMSW-IPR, ACSW University of Mary Hardin-Baylor

11

StatisticsRELIGIOUS PREFERENCE

N Valid 1477Missing 9

Mode 1

RELIGIOUS PREFERENCE

Frequency PercentValid

PercentCumulative

Percent

Valid 1 PROTESTANT 886 59.6 60.0 60.02 CATHOLIC 367 24.7 24.8 84.83 JEWISH 26 1.7 1.8 86.64 NONE 146 9.8 9.9 96.55 OTHER 52 3.5 3.5 100.0Total 1477 99.4 100.0

Missing 9 NA 9 .6Total 1486 100.0

Example of SPSS pie graph with percentages for the variable Religious Preference:

Brief Interpretation of an Analysis of the Variable Religious Preference Using the Mode

The 1,486 respondents in this survey most often reported they were of a Protestant faith followed by those reporting they were of the Catholic faith.

Brief Interpretation of an Analysis of the Variable Religious Preference Using Percentages

Page 12: Data Analysis Guidelines - University of Mary … · Web viewBasic Data Analysis Guidelines for Research Students Isaac V. Gusukuma, Ph.D., LMSW-IPR, ACSW University of Mary Hardin-Baylor

12

Of the 1,486 total respondents, 59.6% reported they were Protestant, followed by those reporting they were Catholic (24.7%) and Jewish (1.7%), while 9.8% reported they had no religious preference, 3.5% noted they had another religious preference, and 0.6% were “missing,” meaning they did not respond to the question.

Brief Interpretation of an Analysis of the Variable Religious Preference Using a Ratio

Slightly less than three of every five respondents reported they were of the Protestant faith.

Analysis of an Ordinal Level Variable

An ordinal variable is a categorical variable in which there is some inherent rank, hierarchy,

or order to the categories. The concept of “rank” in this instance does not imply that respondents

in a higher category are in some way better than other respondents. Instead, hierarchy or rank

means that the established categories allow the respondents to be arranged along some dimension

or in some order. Common examples of ordinal level variables include Economic Status (Low,

Middle, High), Class Standing (Senior, Junior, Sophomore, Freshman), and attitudinal variables,

such as Satisfaction with Services (High, Medium, Low).

The following statistics may be appropriate for ordinal variables/data:o Frequencies (mode, median)o Percentageso Quartiles

Example of a survey question and SPSS output for an ordinal level variable

Example of a survey question and ordinal response categories:

1. What is your annual family income?

___1 Less than $1,000 ___2 $1,000-2,999

___3 $3,000-3,999___4 $4,000-4,999 ___5 $5,000-5,999___6 $6,000-6,999___7 $7,000-7,999

___ 8 $8,000-9,999___ 9 $10,000-12,499___10 $12,500-14,999___11 $15,000-17,499___12 $17,500-19,999___13 $20,000-22,499___14 $22,500-24,999

___15 $25,000-29,999___16 $30,000-34,999___17 $35,000-39,999___18 $40,000-49,999___19 $50,000-59,999___20 $60,000-74,999___21 $75,000+

Example of SPSS outputs for the variable Family Income

StatisticsTOTAL FAMILY INCOME (N=1486)

N Valid 1405Missing 81

Page 13: Data Analysis Guidelines - University of Mary … · Web viewBasic Data Analysis Guidelines for Research Students Isaac V. Gusukuma, Ph.D., LMSW-IPR, ACSW University of Mary Hardin-Baylor

13

Median 16.00Mode 18.00

Percentiles 25 11.0050 16.0075 19.00

TOTAL FAMILY INCOME

Frequency PercentValid

PercentCumulative

Percent

Valid 1 LT $1000 12 .8 .9 .92 $1000-2999 17 1.1 1.2 2.13 $3000-3999 16 1.1 1.1 3.24 $4000-4999 17 1.1 1.2 4.45 $5000-5999 32 2.2 2.3 6.76 $6000-6999 13 .9 .9 7.67 $7000-7999 21 1.4 1.5 9.18 $8000-9999 38 2.6 2.7 11.89 $10000-12499 73 4.9 5.2 17.010 $12500-14999 62 4.2 4.4 21.411 $15000-17499 68 4.6 4.8 26.312 $17500-19999 63 4.2 4.5 30.713 $20000-22499 70 4.7 5.0 35.714 $22500-24999 70 4.7 5.0 40.715 $25000-29999 103 6.9 7.3 48.016 $30000-34999 110 7.4 7.8 55.917 $35000-39999 80 5.4 5.7 61.618 $40000-49999 141 9.5 10.0 71.619 $50000-59999 93 6.3 6.6 78.220 $60000-74999 93 6.3 6.6 84.821 $75000+ 130 8.7 9.3 94.122 REFUSED 83 5.6 5.9 100.0Total 1405 94.5 100.0

Missing 98 DK 56 3.899 NA 25 1.7Total 81 5.5

Total 1486 100.0

Brief Interpretation of an Analysis of the variable Family Income

Though the annual family income most often reported was between $40,000 and $49,999, the median annual family income for the 1,405 valid respondents was between $30,000 and $34,999. Twenty-five percent of the families reported an annual income of less than $17,500 while the upper 25% reported an annual income of more than $50,000.

Analysis of an Interval/Ratio Level Variable

Numeric values represent various income groups. See the next

Table.

Page 14: Data Analysis Guidelines - University of Mary … · Web viewBasic Data Analysis Guidelines for Research Students Isaac V. Gusukuma, Ph.D., LMSW-IPR, ACSW University of Mary Hardin-Baylor

14

Unlike nominal and ordinal variables that are categorical, interval and ratio level variables

are numeric or scaled variables. For these variables, the numbers are ordered, ranked, and the

distance between the numbers is the same for all numbers (i.e., $5.00 is higher than $4.00 by the

same amount as $99.00 is higher than $98.00). Interval variables are like ratio variables except

interval variables do not have a true zero, meaning a value of zero does not really mean the

absence of the characteristic and the distance between units of measurement of interval variables

are not proportional. For example, age is a ratio level variable because the age of zero means the

person is not yet born and someone 20 years of age is twice that of another who is age 10. IQ

score is an interval level variable because an IQ of 100 does not mean a person has twice the

intelligence of a person with an IQ of 50. Statistically, however, interval and ratio level data are

treated the same way. Variables often used in social research that are interval/ratio level include

number of children in a family, number of therapy or counseling sessions, number of times

married, and number of days hospitalized.

The following statistics may be appropriate for interval/ratio variables/data:o Frequencies (mode, median, mean)o Quartileso Standard deviation

Example of a survey question and SPSS output for an interval level variable

1. How old were you when you were first married?____ Years of age

StatisticsAGE WHEN FIRST MARRIED N Valid 590

Missing 896Mean 22.64Median 22.00Mode 21Std. Deviation 4.710Minimum 13Maximum 57Percentiles 25 19.00

50 22.0075 25.00

Brief Interpretation of an Analysis of Age When First Married

Page 15: Data Analysis Guidelines - University of Mary … · Web viewBasic Data Analysis Guidelines for Research Students Isaac V. Gusukuma, Ph.D., LMSW-IPR, ACSW University of Mary Hardin-Baylor

15

When asked their age when they were first married, 590 of 1,486 respondents had a valid response. The average age of first marriage was 22.64 years (sd = 4.710), while the median age of first marriage was 22 years. The most common or frequently reported age of first marriage was 21 years of age. The youngest age reported of first marriage was 13 years and the oldest was 57 years of age. The lower twenty-five percent of the respondents noted that they first married by age 19 while the upper twenty-five percent reported they were married at or older than the age of 25 years.

While univariate analysis of data is an important and helpful procedure to describe a variable,

even more information about the data can be gathered by conducting bivariate data analysis. The

next section presents a discussion on the more common types of bivariate data analysis

procedures.

Page 16: Data Analysis Guidelines - University of Mary … · Web viewBasic Data Analysis Guidelines for Research Students Isaac V. Gusukuma, Ph.D., LMSW-IPR, ACSW University of Mary Hardin-Baylor

16

Bivariate (2 variables) Data Analysis

The more common bivariate statistical analysis procedures (ones you will most likely use)

include the chi square, t-test, analysis of variance (ANOVA), and Pearson’s r (correlation). Each

procedure is discussed in the sections below and include the assumptions for each statistical

procedure, conditions for selecting a particular statistical procedure, and an SPSS example with a

brief statement describing the analysis of the SPSS output.

Chi Square (Goodness of Fit) Test

The chi square is a nonparametric test for the bivariate analysis of two nominal level

variables. When conducting the chi square test, the data is often displayed as a cross tabulation

(crosstab) or contingency table. The chi square is actually the name of the test statistic used to

determine if there is a significant relationship between the two nominal variables. The specific

statistical procedure is discussed in most statistical textbooks. In SPSS the crosstabs procedure

may also be used to determine the association between two ordinal level variables and

nominal/interval level variables though doing so requires specific and special data analysis

procedures. Consult with your professor or a statistician if you think you will need to conduct an

analysis of ordinal or interval level variables using the crosstabs.

Assumptions of the chi square: Probability sampling design 80% or more of the cells in your contingency table should have an expected cell

frequency of 5 or greater Observations are independent, meaning you should not use the chi square test for

matched pairs. You should apply Yates correction factor for 2x2 contingency tables and the cell

frequencies are 5 or more, but less than 10. You should apply Fisher’s exact test when the sample size for a 2x2 contingency

table is 20 or less.

SPSS procedures to request in the Crosstabs dialogue box (see Appendix 1 Crosstab and Chi

Square SPSS Screens):

Click the “Statistics” button and check the “Chi square” box in the upper left corner Check the appropriate measure of association box (most likely “Phi and Cramer’s V) Click the “Cells” button and check the row and/or column and/or total percentages

(based on how you prefer to look at/analyze the table) box For Residuals, standardized residuals are recommended. Cells with standardized

residual values of greater than +1.0 may reveal the cell that contributes to a significant chi square test.

Page 17: Data Analysis Guidelines - University of Mary … · Web viewBasic Data Analysis Guidelines for Research Students Isaac V. Gusukuma, Ph.D., LMSW-IPR, ACSW University of Mary Hardin-Baylor

17

Measures of association for 2 nominal variables: Phi - for 2x2 tables Cramer’s V for other tables (2x3, 3x3, etc.)

Example 1 - Chi square test

The contingency table and data analysis examines the relationship between general happiness

and marital status of the respondents. Examples of possible survey questions are noted below.

1. What is your current marital status?___1 Married ___2 Widowed ___3 Divorced ___4 Separated ____5 Never married

2. What is your current level of general happiness with life?___ Very happy ___ Pretty happy ___ Not too happy

SPSS Output of a Contingency Table and Chi Square Test

GENERAL HAPPINESS by MARITAL STATUS (N=494)MARITAL STATUS Total

GENERAL HAPPINESS MARRIED WIDOWED DIVORCED SEPARATED NEVER

MARRIED

VERY HAPPY

Count% within Marital

StatusStd. Residual

8133.5%

3.1

816.3%

-1.1

1216.0%

-1.4

19.1%

-1.0

1512.8%

-2.4

11723.7%

PRETTY HAPPY

Count% within Marital

StatusStd. Residual

14359.1%

-1.1

3367.3%

.2

5269.3%

.5

872.7%

.3

8370.9%

.9

31964.6%

NOT TOO HAPPY

Count% within Marital

StatusStd. Residual

187.4%

-2.0

816.3%

.9

1114.7%

.7

218.2%

.6

1916.2%

1.4

5811.7%

Total Count%

242100.0%

49100.0%

75100.0%

11100.0%

117100.0%

494100.0%

Chi-Square Tests

Value df Asymp. Sig. (2-sided)

Pearson Chi-Square 29.537 8 .000Likelihood Ratio 30.445 8 .000Linear-by-Linear

Association22.369 1 .000

N of Valid Cases 494a. 2 cells (13.3%) have expected count less than 5. The minimum expected count is 1.29.

Standardized residuals > +1.0

Level of significance (p) is < .05

Number of cells (%) that have an expected frequency of less than 5

Chi square value and degrees of freedom (df)

Page 18: Data Analysis Guidelines - University of Mary … · Web viewBasic Data Analysis Guidelines for Research Students Isaac V. Gusukuma, Ph.D., LMSW-IPR, ACSW University of Mary Hardin-Baylor

18

Symmetric MeasuresValue Approx. Sig.

Nominal by Nominal Phi .245 .000Cramer's V .173 .000

N of Valid Cases 494a. Not assuming the null hypothesis.b. Using the asymptotic standard error assuming the null hypothesis.

Brief Interpretation of the Analysis of General Happiness with the Marital Status of the Respondents

There is a significant (X 2 = 29.537, df = 8, p < .01) but weak association (Cramer’s V = .173) between one’s level of general happiness and marital status. Persons who are married are significantly more likely to report they are very happy while persons who have never married are more likely to report they are not too happy.

Example 2 - Chi square test

The table and data analysis examines the relationship between favoring or opposing the death

penalty for the crime of murder and gender of the respondent. Possible survey questions are also

provided below.

1. What is your sex?___1 Male ___2 Female

2. Do you favor or oppose the death penalty for the crime of murder?___1 Favor ___2 Oppose

SPSS Output for the a Cross tabulation and Chi square test

FAVOR OR OPPOSE THE DEATH PENALTY FOR MURDER by SEX SEX Total

1 MALE 2 FEMALE

FAVOR OR OPPOSE DEATH

PENALTY FOR MURDER

1 FAVOR Count% within

RESPONDENTS SEXStd. Residual

19981.9%

.8

23273.7%

-.7

43177.2%

2 OPPOSE Count% within

RESPONDENTS SEXStd. Residual

4418.1%

-1.5

8326.3%

1.3

12722.8%

Total Count% within RESPONDENTS SEX

243100.0%

315100.0%

558100.0%

Value of Cramer’s V noting a “weak” association

Page 19: Data Analysis Guidelines - University of Mary … · Web viewBasic Data Analysis Guidelines for Research Students Isaac V. Gusukuma, Ph.D., LMSW-IPR, ACSW University of Mary Hardin-Baylor

19

Chi-Square Tests

Value df Asymp. Sig. (2-sided)

Exact Sig. (2-sided)

Exact Sig. (1-sided)

Pearson Chi-Square 5.301 1 .021Continuity Correction 4.843 1 .028

Likelihood Ratio 5.385 1 .020Fisher's Exact Test .025 .013

Linear-by-Linear Association

5.291 1 .021

N of Valid Cases 558a. Computed only for a 2x2 tableb. 0 cells (.0%) have expected count less than 5. The minimum expected count is 55.31.

Symmetric MeasuresValue Approx. Sig.

Nominal by Nominal Phi .097 .021Cramer's V .097 .021

N of Valid Cases 558a. Not assuming the null hypothesis.b. Using the asymptotic standard error assuming the null hypothesis.

Brief Interpretation of the Analysis of Attitude Toward the Death Penalty for Murder with Respondent’s Sex

There is a significant (X 2 = 5.301, df = 1, p = .021) but weak association (Phi = .097) between a person favoring or opposing the death penalty for the crime of murder and the person’s sex. Women are significantly more likely to oppose the death penalty for the crime of murder than are men.

Statistics is like grout - The word feels decidedly unpleasant in the mouth, but it describes something

essential for holding a mosaic in place.- Ramsey & Schafer -

Page 20: Data Analysis Guidelines - University of Mary … · Web viewBasic Data Analysis Guidelines for Research Students Isaac V. Gusukuma, Ph.D., LMSW-IPR, ACSW University of Mary Hardin-Baylor

20

Appendix 1 Frequencies SPSS Screens

Highlight and move the variables from the variable list to the “Variable(s)” box by clicking the arrowhead.

Page 21: Data Analysis Guidelines - University of Mary … · Web viewBasic Data Analysis Guidelines for Research Students Isaac V. Gusukuma, Ph.D., LMSW-IPR, ACSW University of Mary Hardin-Baylor

21

Click “OK” button to run Frequencies. Output for the first variable, “quality” is noted below..

quality Quality of Svc

Frequency Percent Valid Percent

Cumulative

Percent

Valid 1 Poor 1 .9 .9 .9

2 Fair 2 1.9 1.9 2.8

3 Good 17 15.9 15.9 18.7

4 Excellent 87 81.3 81.3 100.0

Total 107 100.0 100.0

Variable Name Variable Label

Values

Page 22: Data Analysis Guidelines - University of Mary … · Web viewBasic Data Analysis Guidelines for Research Students Isaac V. Gusukuma, Ph.D., LMSW-IPR, ACSW University of Mary Hardin-Baylor

22

Appendix 2 Crosstab and Chi Square SPSS Screens

Click “Statistics” to get the “Crosstabs: Statistics” dialogue box

Page 23: Data Analysis Guidelines - University of Mary … · Web viewBasic Data Analysis Guidelines for Research Students Isaac V. Gusukuma, Ph.D., LMSW-IPR, ACSW University of Mary Hardin-Baylor

23

Click “Cells” to get the “Crosstabs: Cell Display” dialogue box

Page 24: Data Analysis Guidelines - University of Mary … · Web viewBasic Data Analysis Guidelines for Research Students Isaac V. Gusukuma, Ph.D., LMSW-IPR, ACSW University of Mary Hardin-Baylor

24

References

Holcomb, Z. C. (2006). SPSS basics: Techniques for a first course in statistics. Glendale, CA:

Pyrczak Publishing.

Kachigan, S. K. (1986). Statistical analysis: An interdisciplinary introduction to univariate and

multivariate methods. New York: Radius Press.

Keller, G. (2001). Applied statistics with Microsoft® Excel. Pacific Grove, CA: Duxbury.

Norusis, M. J. (2011). IBM SPSS Statistics 19 guide to data analysis. Upper Saddle River, NJ:

Prentice Hall.

Ramsey, F. L., & Schafer, W. (2002). The statistical sleuth: A course in methods of data analysis

(2nd ed.). Belmont CA: Duxbury Press.

Rubin, A., & Babbie, E. (2008). Research methods for social work (7th ed.). Belmont, CA:

Thompson/Brooks Cole.