introduction to spss and statistics.ppt

Introduction to SPSS and Statistics

Dr. Nasser Mansourn.mansour@ex.ac.uk

• Measurement Scales• Reliability: http://www.youtube.com/watch?

v=ER6HriPNKDs• SPSS tutorial http://www.youtube.com/watch?

v=4mf3OA1ifXI• Descriptive statistics• inferential statistics • Normality: http://www.youtube.com/watch?

v=IiedOyglLn0

• Reverse the coding of a variable in SPSS http://www.youtube.com/watch?v=xKJisOMvY54

http://www.youtube.com/watch?v=iQRst7_3bhY • Parametric - Nonparametric Stats

Measurement Scales

• There are four types of measurement scales– Nominal Scales

– Ordinal Scales

– Interval Scales

– Ratio Scales

A Nominal Scale of Measurement

An Ordinal Scale: The Winner of a Horse Race

Interval Scales• Very similar to an ordinal scale with the exception of

the equal intervals of points.• Assumes that equal differences between scores really

mean equal differences in the variable measured.– but no natural zero – Achievement scale– e.g., temperature (C,F)– e.g., Likert scales, rank on a scale of 1-3

Very true A bit true Not true

Science is my favourite because it is about finding out how things work

Science is my favourite because it is about what people feel and how they think

Ratio Scales

• A ratio scale involves the use of numbers to represent equal distances from a known “zero point”.

• A scale designed to measure height would be a ratio scale since the zero point represents the absence of height.– e.g., height, weight, age, length

Four Types of Measurement Scales

Measurement scales

• Categorical or ‘Nominal’– e.g. male/female, or catholic/protestant/other

• Independent variable

• Dependent variable

Summary statistics

• Two essential components of data are:– (i) central tendency of the data &

(ii) spread of the data (e.g. standard deviation)

Although mean (central tendency) and standard deviation (spread) are most commonly used, other measures can also be useful

Measures of central tendency• Mode

– the most frequent observation: 1, 2, 2, 3, 4 ,5

Median the middle number of a dataset arranged in numerical

order: 0, 1, 2, 5, 1000

(average of middle two numbers when even number of scores exist)

Mean = =

The Standard Deviation • The Standard Deviation is a measure of how spread

out numbers are

68% of values are within1 standard deviation of the mean

99.7% are within 3 standard deviations

Normal Distribution

• the data tends to be around a central value with no bias left or right, and it gets close to a "Normal Distribution" like this:

Normality & Skewness

Negative skew: The left tail is longer; the mass of the distribution is concentrated on the right of the figure. The distribution is said to be left-skewed, left-tailed, or skewed to the left.Positive skew: The right tail is longer; the mass of the distribution is concentrated on the left of the figure. The distribution is said to be right-skewed, right-tailed, or skewed to the right.

Normality & Kurtosis

variance• variance measures how far a set of numbers is spread out. (A variance of zero

indicates that all the values are identical.) A non-zero variance is always positive: A small variance indicates that the data points tend to be very close to the mean (expected value) and hence to each other, while a high variance indicates that the data points are very spread out from the mean and from each other.

• T-test or ANOVA assumes that the populations tested have the same variance. T-test or ANOVA tests to see if the central tendency (mean) difference between samples (middle diagram).

TEST FOR NORMALITY USING SPSS

Please watch this video: http://www.youtube.com/watch?v=IiedOyglLn0

Terminology

Population

Sample

Describing Ordinal Data - Frequencies

confidence in speaking time 1

1 2.5 2.5 2.5

2 5.0 5.0 7.5

8 20.0 20.0 27.5

10 25.0 25.0 52.5

8 20.0 20.0 72.5

6 15.0 15.0 87.5

5 12.5 12.5 100.0

40 100.0 100.0

ValidFrequency Percent Valid Percent

CumulativePercent

An Overall Test for Normality

Tests of Normality

.109 40 .200* .975 40 .509attitude to schoolStatistic df Sig. Statistic df Sig.

Kolmogorov-Smirnova

Shapiro-Wilk

This is a lower bound of the true significance.*.

Lilliefors Significance Correctiona.

Parametric versus Nonparametric Statistics – When to use them and which is more powerful?

Parametric Assumptions

• The observations must be independent (For example participants need to have completed the dependent variable separately, not in groups).

• The observations must be drawn from normally distributed populations

• These populations must have the same variances (the variance is a measure of how far a set of numbers is spread out)

• parametric test, of course, is a test that requires a parametric assumption, such as normality. A nonparametric test does not rely on parametric assumptions like normality.

• a nonparametric test protects against some violations of assumptions and not others. The two sample t-test requires three assumptions, normality, equal variances, and independence. The non-parametric alternative, the Mann-Whitney-Wilcoxon test, does not rely on the normality assumption,

Differences between independent groups

• Two samples – compare mean value for some variable of interest

Parametric Nonparametric

t-test for independent samples

Wald-Wolfowitz runs test

Mann-Whitney U test

Kolmogorov-Smirnov two sample test

Differences between independent groups

• Multiple groups

Analysis of variance (ANOVA/ MANOVA)

Kruskal-Wallis analysis of ranks

Median test

Differences between dependent groups

• Compare two variables measured in the same sample

• If more than two variables are measured in same sample

t-test for dependent samples

Sign test

Wilcoxon’s matched pairs test

Repeated measures ANOVA

Friedman’s two way analysis of variance

Cochran Q

Relationships between variables

• Two variables of interest are categorical

Correlation coefficient

Spearman R

Kendall Tau

Coefficient Gamma

Chi square

Phi coefficient

Fisher exact test

Kendall coefficient of concordance

Summary Table of Statistical TestsLevel of

MeasurementSample Characteristics Correlation

1 Sample

2 Sample K Sample (i.e., >2)

Independent Dependent Independent Dependent

Categorical or Nominal

Χ2 or bi-

nomial

Χ2 Macnarmar’s Χ2

Χ2 Cochran’s Q

Rank or Ordinal

Mann Whitney U

Wilcoxin Matched

Pairs Signed Ranks

Kruskal Wallis H

Friendman’s ANOVA

Spearman’s rho

Parametric (Interval &

Ratio)

z test or t test

t test between groups

t test within groups

1 way ANOVA between groups

1 way ANOVA

(within or repeated measure)

Pearson’s r

Factorial (2 way) ANOVA

Paired-Samples t Test

• Also known as the t test for dependent means

• Also known as the Dependent t-test

Definitions for Paired-Samples t Test

Definitions of Test

• Sample mean is what researcher will find – The value (score) by using statistical analysis

(mean)

• Paired sample = paired scores

• Paired = matched (they go together)

Interpreting SPSS Output for t Test

Interpreting Output Ch. 12 Holcomb Paired-Samples t Test Output – Unformatted

Paired Samples Statistics

11.5556 9 2.29734 .76578

9.6667 9 2.23607 .74536

Pretest

Posttest

Mean N Std. DeviationStd. Error

Paired Samples Correlations

9 .527 .145Pretest & PosttestPair 1N Correlation Sig.

Paired Samples Test

1.88889 2.20479 .73493 .19414 3.58364 2.570 8 .033Pretest - PosttestPair 1Mean Std. Deviation

Std. ErrorMean Lower Upper

95% ConfidenceInterval of the

Difference

Paired Differences

t df Sig. (2-tailed)

The notion of statistical significance

Mean score = 6 Mean score = 8

Basic notion of significance• Assume no difference in outcome between the two groups.

• This is our NULL HYPOTHESIS Ho

• Would it be unusual for two random samples from one population to have means as different as 6 and 8?

• If yes, we could assume that the two groups came from different populations

• Difference between 6 and 8 is statistically significant. Reject Ho

• If no, we would say that the difference is probably just down to chance

• Difference is not statistically significant. Accept Ho

Test for Significance

• The probability is small that the difference or relationship happened by chance, and p is less than the critical alpha level (p < alpha ).

• Your finding is significant. • You reject the null hypothesis. • The probability is high that the difference or relationship

happened by chance, and p is greater than the critical alpha level (p > alpha ). :

• Your finding is not significant. • You fail to reject the null hypothesis.

The columns labeled "Levene's Test for Equality of Variances" tell us whether an assumption of the t-test has been met. The t-test assumes that the variability of each group is approximately equal. If that assumption isn't met, then a special form of the t-test should be used. Look at the column labeled "Sig." under the heading "Levene's Test for Equality of Variances". In this example, the significance (p value) of Levene's test is .203. If this value is less than or equal to your α level for the test (usually .05), then you can reject the null hypothesis that the variability of the two groups is equal, implying that the variances are unequal. then you should use the bottom row of the output (the row labeled "Equal variances not assumed.") If the p value is greater than your α level, then you should use the middle row of the output (the row labeled "Equal variances assumed.") In this example, .203 is larger than α, so we will assume that the variances are equal and we will use the middle row of the output.

The column labeled "Sig. (2-tailed)" gives the two-tailed p value associated with the test. In this example, the p value is .151. the decision rule is given by: If p ≤ α , then reject H0. In this example, .151 is not less than or equal to .05, so we fail to reject H0. That implies that we failed to observe a difference in the number of older siblings between the two sections of this class.

If we were writing this for publication in an APA journal, we would write it as:A t test failed to reveal a statistically reliable difference between the mean number of older siblings that the 10 AM section has (M = 0.86, s = 1.027) and that the 11 AM section has (M = 1.44, s = 1.318), t(44) = 1.461, p = .151, α = .05.

Reporting Statistics in APA StyleA Short Guide to Handling Numbers and Statistics in APA Format

• http://my.ilstu.edu/~mshesso/apa_stats_format.html• Description: The 16 teenagers who volunteered for the pilot study

were younger than expected, M = 14.2 years, SD = 1.3. • Correlation: The correlation of peer reports (M = 4.2, SD = 2.1, N

= 367) and self reports (M = 5.8, SD = 2.3) of victimization was highly significant, r(365) = .32, p = .008.

• Regression: A linear regression analysis revealed that social skills was a highly significant predictor of aggression scores (b = .40, p = .008), accounting for 16% of the variance in aggressive behavior.

• Achievement test scores were regressed on class size and number of writing assignments. These two predictors accounted for just under half of the variance in test scores (R2 = .49), which was highly significant, F(2,289) = 12.5, p=.005. Both the writing assignment (b = .46, p=.001) and the class size (b=.28, p = .014) demonstrated significant effects on the achievement scores.

• t Tests: The 36 study participants had a mean age of 27.4 (SD = 12.6) and were significantly older than the university norm of 21.2 years, t(35) = 2.95, p = 0.01.

• The 25 participants had an average difference from pre-test to post-test anxiety scores of -4.8 (SD = 5.5), indicating the anxiety treatment resulted in a highly significant decrease in anxiety levels, t(24) = -4.36, p = .005 (one-tailed).

• The 36 participants in the treatment group (M = 14.8, SD = 2.0) and the 25 participants in the control group (M = 16.6, SD = 2.5), demonstrated a significant difference in performance (t[59] = -3.12, p = .01); as expected, the visual priming treatment inhibited performance on the phoneme recognition task.

Introductory to SPSSStatistical Package for the Social

Sciences

Developing a coding schemeDeveloping a coding schemeandand

setting up your SPSS data filesetting up your SPSS data fileplease watch this:please watch this:

http://www.youtube.com/watch?v=4mf3OA1ifXI

DEVELOPING A CODING SCHEME DEVELOPING A CODING SCHEME AND SETTING UP YOUR SPSS FILEAND SETTING UP YOUR SPSS FILE

The following steps must be taken in the order presented below:

1. Develop a Coding Scheme:

(Ground Rules to be Followed When Coding the Data)HOW?– By starting with a blank copy of your survey(For an example, see the following Faculty

Unionization Survey)

DEVELOPING A CODING SCHEMEDEVELOPING A CODING SCHEME

(Continued on the next slide)

Developing a coding scheme and setting up Developing a coding scheme and setting up your spss fileyour spss file

The following steps must be taken in the orderpresented below:

1. Develop a Coding Scheme--use a blank survey to:

A. Specify any additional variables that may need to be added to the data set.

B. Identify your reverse items (items that will have to be reverse coded before they are used in any computations

and/or analysis).

Campus: ISU NIU SIU

ID____

Deadline: Before After

ATTITUDE TOWARD UNIONS:

Liberalism/CONSERVATISM:

DEVELOPING A CODING SCHEME DEVELOPING A CODING SCHEME AND SETTING UP YOUR SPSS AND SETTING UP YOUR SPSS

FILEFILEThe following steps must be taken in the order presented below:

1. Develop a Coding Scheme--use a blank survey to:

A. Specify any additional variables that may need to be added to the data set. B. Identify your reverse items (items that will have to be reverse coded before they are used in any computations and/or analysis). C. Assign variable names to all your variables. D. Specify how you wish to code the values of each variable.E. Specify how you will code missing values (e.g., non- responses).

A true non-response is ordinarily coded as a blank.

uni1uni2

runi3runi4rcon1con2r

con3con4con5r

con7con8

Campus: ISU=1 NIU=2 SIU=3

ID_____

Deadline: Before=0 After=1

marital

rank rankyr

degree

experyr

instyr

tenure

0 1125

2. Create your new SPSS FILE and define the attributes of your variables following the instructions in your handouts entitled:“Make sure to follow the ground rules you have established in your coding scheme. That is, to designate:

A. Variable Names

B. Variable Labels

C. Value Labels (e.g., for variable gender, 0=Male, 1=Female this is especially important for categorical variables.)

D. User-Missing Values

E. Variable Types (i.e., numeric or string)

F. Variable Formats (maximum number of digits and decimals required for coding the values)

G. Column Formats (column width used to display a variable on the screen)

– NOTE: It may be more convenient to assign the same attributes to several variables by copying and pasting the attributes.

3. Code the data into your SPSS file following the ground

rules established in your coding scheme and observed

when creating your SPSS file.

4. Reverse code your reverse items--only after all the coding is

completed (Menu Bar: Transform, Recode into a new var.)

Developing a coding scheme and setting up your Developing a coding scheme and setting up your spss filespss file

5. Compute and print summary/descriptive statistics--e.g., mean, std. dev., min., and max. for metric variables; and frequencies, min., and max. for categorical variables

(Menu Bar: Analyze, Descriptive Statistics)

6. Examine the data and summary statistics (e.g., frequencies, min. & max. values) to spot and correct coding errors.

7. Calculate reliability ONLY for each of your summated multi-item scale and, (a) identify items detracting from reliability, and (b) refine the scale accordingly

(Menu Bar: Analyze, Scale, Reliability Analysis, Statistics--“Inter-item correlations” and “Scale if item deleted”)

Negatively-keyed items and reverse-scoring

• A self-esteem questionnaire might include an item such as “I like myself”, which is rated on a 5-point likert scale (1 = Strongly Disagree, 2 = Disagree, 3 = Neutral, 4 = Agree, 5 = Strongly Agree). This item is positively-keyed and the item indicat.

• a self-esteem questionnaire might include an item such as “I dislike myself”, rated on the same 5-point scale (1 = Strongly Disagree, 2 = Disagree, 3 = Neutral, 4 = Agree, 5 = Strongly Agree). This item is negatively-keyed, because an Agreement or Strong Agreement with the item indicates a relatively low level of self-esteem and the item indicates a relatively low level of self-esteem

Negatively-keyed items and reverse-scoring• If a questionnaire includes positively-keyed and negatively-keyed items,

then the negatively-keyed items MUST be “reverse-scored” before computing individuals’ total scores and before conducting many psychometric analyses (e.g., reliability analysis).

• To reverse score an item, we transform or re-code the responses so that high “scores” on the item indicate high levels of the attribute being measured (and so that low scores indicate low levels of the attribute). For example, if an individual taking a self-esteem test responded “1” (Strongly Disagree) to the “I dislike myself” item, then we recode this individual’s response to a 5. Thus, the reverse-scored item now has a high score (a 5 instead of a 1), which indicates a high level of self-esteem.

Reverse the coding of a variable in SPSS

• Watch these two videos

• http://www.youtube.com/watch?v=xKJisOMvY54

http://www.youtube.com/watch?v=iQRst7_3bhY

8. Create a composite/summated variable for each summated multi-item scale

(Menu Bar: Transform, Compute--Target Variable = a mathematical expression)See handout entitled “SPSS Tutorial: Calculating New Data Values.”(SPSS FILE “UNIONIZATION CODING

EXERCISE_FICTITIOUS DATA”)9. Compute Descriptive Statistics for different subgroups

separately (Menu Bar: Data, Split File, Organize Output by Groups--specify

the grouping variable) OR (Menu Bar: Analyze, Compare Means, Means, Dep. List, Indep. List, Select Options, OK)

10. Save and print your SPSS data file (Menu Bar: File, Save As…)11. Analyze the data (Menu Bar: Analyze--select analysis options)

• Overview of SPSS• A brief tour• Transforming data

– Recode– Compute– Select If

• Univariate analysis– Frequencies– Descriptives– Explore

Overview of SPSS

• SPSS is a statistical package for beginning, intermediate, and advanced data analysis

• Other statistical packages include SAS and Stata

SPSS Files and Extensions

• Data file -- .sav

• Output file -- .spo

Opening SPSS

• Go to start and find SPSS for Windows

• Click on SPSS 21.0 for Windows to open

• You’ll need to update your SPSS license every year (or your school technician will do it for you)

The Workspace

Variables

Toggle between Data and Variable

Value labels

Creating Your Own Data File

It involves creating:– Variable names– Variable labels– Value labels– Missing values

Data Entry (by hand)1. Click Variable View

2. Click the Row 1, Name cell and type Campus (no spaces allowed in name)

4. Type 2 for the value and dubai for the label- click Add and then OK

3. Click the Row 1, Values cell and type 1 for the value and abu dhabi for the label- click Add

Data Entry (by hand)

What’s Next?

• Now you know how to open an existing SPSS data file and how to enter data

• Let’s do a quick overview of SPSS and then we’ll learn how to analyze your data

Univariate Analysis

• Now that we know how to open existing files, we’re ready to begin analyzing data

• Univariate analysis refers to analyzing variables one-at-a-time

Types of Univariate Analysis Procedures

• Frequencies

• Descriptives

• Explore

Frequencies• Go to Analyze/Descriptive

Statistics/Frequencies

Bar Charts

• Bar charts – click on Analyze/Descriptive Statistics/Frequencies

• Click on “Charts”

• Select “Bar Charts” and click on “Continue” and then on OK

Histograms

• Click on click on Analyze/Descriptive Statistics/Frequencies

• Click on “Charts”• Select “Histograms” and click on “Continue” and

then on OK

Statistics

• Click on Analyze/Descriptive Statistics/Frequencies

• Click on “Statistics”

• Select the statistics you want and click on “Continue” and then on OK

More Exercises for Frequencies

• Run the frequency distribution for AGE

• Get a histogram for AGE

• Compute the following statistics for AGE:– Mean– Median– Standard deviation– Percentiles – 25th, 50th, and 75th

Descriptives

• Click on Analyze/Descriptive Statistics/Descriptives

• Select AGE and REl

• Click on “Options” and select the statistics you want and then click on “Continue” and OK

Exercises for Descriptives

• Use Descriptives to compute the following statistics for AGE– Mean– Standard deviation– Variance– Skewness– Kurtosis

Explore

• Click on Analyze/Descriptive Statistics/Explore

• Select EDUC and put it in the “Dependent List”

• In the Display box on the lower left, click on “Both”

• Click on OK

Selecting Statistics for Explore

• Click on Analyze/Descriptive Statistics/Explore

• Click on “Statistics” and select the statistics you want

• Click on “Continue” and then OK

Selecting Plots for Explore

• Click on “Plots”

• Select the plots you want

• Click on “Continue” and then OK

introduction to spss and statistics.ppt

Documents

introduction to spss 18

introduction to spss 16 - university of...

introduction to spss

[ppt]i. introduction - department of statisticsaa/sta6126/3....

spss syntax introduction

introduction to spss - university of...

spss introduction document-sahyadri

introduction to spss 19 -...

introduction to spss...introduction to spss spss is the...

introduction to spss – part 1

introduction to spss - arizona state...

vital health statistics.ppt

introduction to spss - arizona state...

an introduction to spss & endnote

1. introduction to spss...

introduction to spss 19 -...

an introduction to spss

introduction to the spss · introduction to the spss 1st...

introduction to spss 19 - aarhus universitet · 2015. 5....

active statistics.ppt