data analysis statistics. levels of measurement nominal – categorical; no implied rankings among...

23
Data Analysis Statistics

Post on 22-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and

Data Analysis

Statistics

Page 2: Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and

Levels of Measurement

• Nominal – Categorical; no implied rankings among the categories. Also includes written observations and written responses from qualitative interviews or open-ended survey questions.

• Ordinal – Categorical data with implied rankings or data obtained through respondent ranking of categories. In some cases, a ranking process may be set up for a particular variable.

• Interval – No fixed zero point. Data is numerical, not categorical. Rank order among variables is explicit with an equal distance between points in the data set: -2, -1, 0, +1, + 2

• Ratio – Fixed zero point; otherwise the same as interval.

Page 3: Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and

In general, type of data can be inferred using the following the

criteria• Nominal – Categorical; no implied rankings among the categories. Also

includes written observations and written responses from qualitative interviews or open-ended survey questions.

• Ordinal – Categorical data with implied rankings or data obtained through respondent ranking of categories. In some cases, a ranking process may be set up for a particular variable.

• Interval – No fixed zero point. Data is numerical, not categorical. Rank order among variables is explicit with an equal distance between points in the data set: -2, -1, 0, +1, + 2

• Ratio – Fixed zero point; otherwise the same as interval. • Any categorical data is either nominal or ordinal.• All qualitative data is nominal.• All scores on standardized scales are either interval or ratio. (Note: almost

all the scales we use in social work, except IQ scores are ratio).• The level of measurement determines what statistical method we can use.

Page 4: Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and

In some cases, we can covert a variable into another level of

measurementWe can change a variable from ratio to either ordinal or nominal

Page 5: Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and

Coverting Data (Use Recode in SPSS)

Data Set Categories Occurrences

5 1 to 2 2

8 3 to 5 3

4 6 to 8 3

2 9 to 10 2

9

6

10

7

3

1

Page 6: Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and

Advantages of using ratio data

• We can covert it to another level of data; we can’t do this with nominal data.

• People can simply write down information about how they fit a particular attribute (age, income).

• We have more statistical options with ratio data. Inferential statistics requires that dependent variables always be ratio.

Page 7: Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and

Primary types of data analysis are:

• Qualitative• Descriptive. Used to describe the distribution of

a single variable or the relationship between two nominal variables (mean, frequencies, cross-tabulation)

• Inferential (Used to establish relationships among variables; assumes random sampling and a normal distribution)

• Nonparametric (Used to establish causation for small samples or data sets that are not normally distributed)

Page 8: Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and

Much of what you will use in your research will be descriptive

statistics.For example, the most basic type of descriptive

statistic is the frequency. Frequencies are the number of times a specific value or data within a specific category occurs.

Most often we convert frequencies to percentages – Formula is f/n, where f = frequency and n = the total number of values in a data set. For example, the if the age 25 occurs 5 times in a data set of 50 = 5/50 = 10%.

Page 9: Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and

Examples of use of frequency data

• 40% of respondents are male.• The mean level of income was $35,000• 40% of all female voters cast their vote for

Arnold compared to 52% of the male voters.

*Note: the other descriptive statistic we use is the standard deviation. It describes the degree to which data points vary from the mean of a distribution. In a research article, you will see the standard deviation included with the mean.

Page 10: Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and

Application of Standard Deviation (SD)

• Mean income was $35,000 with SD = $ 5,000

• M = $23,000, SD = $500

• This is interpreted as there being less variability in income among members of the second data set. That is scores are grouped more tightly around the mean.

Page 11: Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and

Normal Distribution

• Mean=median=mode• Bell shape curve• 50% of scores fall below and 50% fall above the mean. • Data set can be assessed in terms of how much data

falls within one, two or three standard deviations from the mean.

• Generally is unimodal although some distributions may be bimodal or trimodal.

• Theoretically, at least, inferential statistics may only be used when a set of scores conform to a normal distribution. However, this assumption is often violated.

Page 12: Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and

Frequencies used in almost all types of data analysis. Frequency tables can be formatted in a variety of ways.

(Some analysis add value and cumulative percent)

Age Number Percent

0-18 10 20.0%

19-34 15 30.0%

35-64 15 30.0%

65 & over

10 20.0%

Total 50 100%

Page 13: Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and

We can also use tables to determine if there is a relationship between two nominal

variables, although we can not assess the strength of the relationship. This is called a

cross-tabulation

Starting Salary Female Male

$20,000 to $29,999

19 (70%) 5 (23%)

$30,000 to $39,999

7 (26%) 14 (64%)

$40,000 to $54,999

1 (4%) 3 (13%)

Total 27 (100%) 22 (100%)

Page 14: Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and

Categories in both Qualitative Analysis must be:

• Mutually exclusive (no overlap)

• Exhaustive (all possible categories should be included)

Page 15: Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and

Cross-tabulation is the basis for chi-square. Chi-square:

• Measures the strength of the relationship between the two variables in the table.

• Is not technically a inferential statistic – does not require a normal distribution – but is often grouped with inferential statistics.

• Usually requires a random sample although data collected from everyone in a population group is usually considered sufficient for a chi-square analysis.

Page 16: Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and

Means can also be used to make comparisons among groups.

Income Male

M = $35,000 SD = $5,000

Female

M = $22,000 SD = $750

Page 17: Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and

You may use means on your project

• If your variables include ratio data

• If you want to compare groups on a ratio variable

• If you want to summarize scores on a standardized instrument or a likert scale

Page 18: Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and

Some inferential statistics look at the strength of the relationship between mean scores on ratio level variables and membership in particular

demographic group• T-tests (two group comparisons)

• Analysis of variance (compares three or more groups)

Answers question: Is the difference in means between the two (or more) groups large enough to be statistically significant?

Page 19: Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and

We also use correlations to measure the strength of a relationship between two

variables. Correlations can only be used• To assess the strength of two ratio level

variables.

• To measure associations rather than cause and effect relationships.

• With data sets in which there are 30 or more observations.

Page 20: Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and

Inferential statistics commonly used include:

• Independent T-test (compares two groups on one variable). (Test statistic = T)

• Paired sampled t-test (compares ratio level scores on pre and post test data). (Test statistic = T)

• ANOVA – compares three or more groups on ratio data (Test statistic = F)

• Correlation – measures the association between two ratio level variables (Test statistic = R)

• Regression analysis (dependent ratio variable – can include more than one independent variable (can be a combination of ratio, ordinal, and nominal data in the regression model). (Test statistic is R2, F, or partial correlation coefficients)

Page 21: Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and

Inferential Statistics require that we assess the probability that there is actually a causal

relationship between two variables. • We state the research & null hypotheses.• State the degree to which we will risk being wrong about

whether or not a relationship actually exists between two variables (level of significance – usually under .10)

• Choose an appropriate statistical test and compute it.• Compare the probability level on your computer print out

to the level of significance. If the p. value is lower than your confidence level, then reject the null hypothesis. If the p value is higher than the confidence level, accept the null hypothesis.

Page 22: Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and

For example:

• There is a positive relationship between scores on the self-esteem scale and depression. Level of significance is .05. R = .75, p = .01. Reject Null Hypothesis and accept the Research Hypothesis.

• Women will have higher test scores than men. Level of significance = .10. T = .30, p. = .60. Accept the Null Hypothesis and Reject the Research Hypothesis.

Page 23: Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and

Other info

• Chi-square is interpreted in the same way as inferential statistics.

• Most statistics books contain tables that let you determine p values if you calculate test statistics by hand.

• SPSS print outs always contain p values for inferential statistics.

• Theoretical assumptions are often violated in research articles.

• Sample size determines if a relationship between two or more variables is large enough to be statistically significant.

• Relationships between two variables can be either positive or negative. High positive relationships are close to +1.00 and high negative relationships are close to – 1.00.