introduction to spss and statistics.ppt
Post on 03-Dec-2015
181 Views
Preview:
TRANSCRIPT
Introduction to SPSS and Statistics
Dr. Nasser Mansourn.mansour@ex.ac.uk
• Measurement Scales• Reliability: http://www.youtube.com/watch?
v=ER6HriPNKDs• SPSS tutorial http://www.youtube.com/watch?
v=4mf3OA1ifXI• Descriptive statistics• inferential statistics • Normality: http://www.youtube.com/watch?
v=IiedOyglLn0
• Reverse the coding of a variable in SPSS http://www.youtube.com/watch?v=xKJisOMvY54
http://www.youtube.com/watch?v=iQRst7_3bhY • Parametric - Nonparametric Stats
Measurement Scales
• There are four types of measurement scales– Nominal Scales
– Ordinal Scales
– Interval Scales
– Ratio Scales
A Nominal Scale of Measurement
An Ordinal Scale: The Winner of a Horse Race
Interval Scales• Very similar to an ordinal scale with the exception of
the equal intervals of points.• Assumes that equal differences between scores really
mean equal differences in the variable measured.– but no natural zero – Achievement scale– e.g., temperature (C,F)– e.g., Likert scales, rank on a scale of 1-3
Very true A bit true Not true
Science is my favourite because it is about finding out how things work
Science is my favourite because it is about what people feel and how they think
Ratio Scales
• A ratio scale involves the use of numbers to represent equal distances from a known “zero point”.
• A scale designed to measure height would be a ratio scale since the zero point represents the absence of height.– e.g., height, weight, age, length
Four Types of Measurement Scales
9
Measurement scales
• Categorical or ‘Nominal’– e.g. male/female, or catholic/protestant/other
• Independent variable
• Dependent variable
10
Summary statistics
• Two essential components of data are:– (i) central tendency of the data &
(ii) spread of the data (e.g. standard deviation)
Although mean (central tendency) and standard deviation (spread) are most commonly used, other measures can also be useful
11
Measures of central tendency• Mode
– the most frequent observation: 1, 2, 2, 3, 4 ,5
Median the middle number of a dataset arranged in numerical
order: 0, 1, 2, 5, 1000
(average of middle two numbers when even number of scores exist)
Mean = =
The Standard Deviation • The Standard Deviation is a measure of how spread
out numbers are
68% of values are within1 standard deviation of the mean
99.7% are within 3 standard deviations
13
Normal Distribution
• the data tends to be around a central value with no bias left or right, and it gets close to a "Normal Distribution" like this:
Normality & Skewness
Negative skew: The left tail is longer; the mass of the distribution is concentrated on the right of the figure. The distribution is said to be left-skewed, left-tailed, or skewed to the left.Positive skew: The right tail is longer; the mass of the distribution is concentrated on the left of the figure. The distribution is said to be right-skewed, right-tailed, or skewed to the right.
Normality & Kurtosis
variance• variance measures how far a set of numbers is spread out. (A variance of zero
indicates that all the values are identical.) A non-zero variance is always positive: A small variance indicates that the data points tend to be very close to the mean (expected value) and hence to each other, while a high variance indicates that the data points are very spread out from the mean and from each other.
• T-test or ANOVA assumes that the populations tested have the same variance. T-test or ANOVA tests to see if the central tendency (mean) difference between samples (middle diagram).
TEST FOR NORMALITY USING SPSS
Please watch this video: http://www.youtube.com/watch?v=IiedOyglLn0
Terminology
Population
Sample
Describing Ordinal Data - Frequencies
confidence in speaking time 1
1 2.5 2.5 2.5
2 5.0 5.0 7.5
8 20.0 20.0 27.5
10 25.0 25.0 52.5
8 20.0 20.0 72.5
6 15.0 15.0 87.5
5 12.5 12.5 100.0
40 100.0 100.0
1.00
2.00
3.00
4.00
5.00
6.00
7.00
Total
ValidFrequency Percent Valid Percent
CumulativePercent
An Overall Test for Normality
Tests of Normality
.109 40 .200* .975 40 .509attitude to schoolStatistic df Sig. Statistic df Sig.
Kolmogorov-Smirnova
Shapiro-Wilk
This is a lower bound of the true significance.*.
Lilliefors Significance Correctiona.
Parametric versus Nonparametric Statistics – When to use them and which is more powerful?
Parametric Assumptions
• The observations must be independent (For example participants need to have completed the dependent variable separately, not in groups).
• The observations must be drawn from normally distributed populations
• These populations must have the same variances (the variance is a measure of how far a set of numbers is spread out)
• parametric test, of course, is a test that requires a parametric assumption, such as normality. A nonparametric test does not rely on parametric assumptions like normality.
• a nonparametric test protects against some violations of assumptions and not others. The two sample t-test requires three assumptions, normality, equal variances, and independence. The non-parametric alternative, the Mann-Whitney-Wilcoxon test, does not rely on the normality assumption,
Differences between independent groups
• Two samples – compare mean value for some variable of interest
Parametric Nonparametric
t-test for independent samples
Wald-Wolfowitz runs test
Mann-Whitney U test
Kolmogorov-Smirnov two sample test
Differences between independent groups
• Multiple groups
Parametric Nonparametric
Analysis of variance (ANOVA/ MANOVA)
Kruskal-Wallis analysis of ranks
Median test
Differences between dependent groups
• Compare two variables measured in the same sample
• If more than two variables are measured in same sample
Parametric Nonparametric
t-test for dependent samples
Sign test
Wilcoxon’s matched pairs test
Repeated measures ANOVA
Friedman’s two way analysis of variance
Cochran Q
Relationships between variables
• Two variables of interest are categorical
Parametric Nonparametric
Correlation coefficient
Spearman R
Kendall Tau
Coefficient Gamma
Chi square
Phi coefficient
Fisher exact test
Kendall coefficient of concordance
Summary Table of Statistical TestsLevel of
MeasurementSample Characteristics Correlation
1 Sample
2 Sample K Sample (i.e., >2)
Independent Dependent Independent Dependent
Categorical or Nominal
Χ2 or bi-
nomial
Χ2 Macnarmar’s Χ2
Χ2 Cochran’s Q
Rank or Ordinal
Mann Whitney U
Wilcoxin Matched
Pairs Signed Ranks
Kruskal Wallis H
Friendman’s ANOVA
Spearman’s rho
Parametric (Interval &
Ratio)
z test or t test
t test between groups
t test within groups
1 way ANOVA between groups
1 way ANOVA
(within or repeated measure)
Pearson’s r
Factorial (2 way) ANOVA
Paired-Samples t Test
Paired-Samples t Test
• Also known as the t test for dependent means
• Also known as the Dependent t-test
Definitions for Paired-Samples t Test
Definitions of Test
• Sample mean is what researcher will find – The value (score) by using statistical analysis
(mean)
• Paired sample = paired scores
• Paired = matched (they go together)
Interpreting SPSS Output for t Test
Interpreting Output Ch. 12 Holcomb Paired-Samples t Test Output – Unformatted
Paired Samples Statistics
11.5556 9 2.29734 .76578
9.6667 9 2.23607 .74536
Pretest
Posttest
Pair1
Mean N Std. DeviationStd. Error
Mean
Paired Samples Correlations
9 .527 .145Pretest & PosttestPair 1N Correlation Sig.
Paired Samples Test
1.88889 2.20479 .73493 .19414 3.58364 2.570 8 .033Pretest - PosttestPair 1Mean Std. Deviation
Std. ErrorMean Lower Upper
95% ConfidenceInterval of the
Difference
Paired Differences
t df Sig. (2-tailed)
The notion of statistical significance
Mean score = 6 Mean score = 8
Basic notion of significance• Assume no difference in outcome between the two groups.
• This is our NULL HYPOTHESIS Ho
• Would it be unusual for two random samples from one population to have means as different as 6 and 8?
• If yes, we could assume that the two groups came from different populations
• Difference between 6 and 8 is statistically significant. Reject Ho
• If no, we would say that the difference is probably just down to chance
• Difference is not statistically significant. Accept Ho
Test for Significance
• The probability is small that the difference or relationship happened by chance, and p is less than the critical alpha level (p < alpha ).
• Your finding is significant. • You reject the null hypothesis. • The probability is high that the difference or relationship
happened by chance, and p is greater than the critical alpha level (p > alpha ). :
• Your finding is not significant. • You fail to reject the null hypothesis.
The columns labeled "Levene's Test for Equality of Variances" tell us whether an assumption of the t-test has been met. The t-test assumes that the variability of each group is approximately equal. If that assumption isn't met, then a special form of the t-test should be used. Look at the column labeled "Sig." under the heading "Levene's Test for Equality of Variances". In this example, the significance (p value) of Levene's test is .203. If this value is less than or equal to your α level for the test (usually .05), then you can reject the null hypothesis that the variability of the two groups is equal, implying that the variances are unequal. then you should use the bottom row of the output (the row labeled "Equal variances not assumed.") If the p value is greater than your α level, then you should use the middle row of the output (the row labeled "Equal variances assumed.") In this example, .203 is larger than α, so we will assume that the variances are equal and we will use the middle row of the output.
The column labeled "Sig. (2-tailed)" gives the two-tailed p value associated with the test. In this example, the p value is .151. the decision rule is given by: If p ≤ α , then reject H0. In this example, .151 is not less than or equal to .05, so we fail to reject H0. That implies that we failed to observe a difference in the number of older siblings between the two sections of this class.
If we were writing this for publication in an APA journal, we would write it as:A t test failed to reveal a statistically reliable difference between the mean number of older siblings that the 10 AM section has (M = 0.86, s = 1.027) and that the 11 AM section has (M = 1.44, s = 1.318), t(44) = 1.461, p = .151, α = .05.
Reporting Statistics in APA StyleA Short Guide to Handling Numbers and Statistics in APA Format
• http://my.ilstu.edu/~mshesso/apa_stats_format.html• Description: The 16 teenagers who volunteered for the pilot study
were younger than expected, M = 14.2 years, SD = 1.3. • Correlation: The correlation of peer reports (M = 4.2, SD = 2.1, N
= 367) and self reports (M = 5.8, SD = 2.3) of victimization was highly significant, r(365) = .32, p = .008.
• Regression: A linear regression analysis revealed that social skills was a highly significant predictor of aggression scores (b = .40, p = .008), accounting for 16% of the variance in aggressive behavior.
• Achievement test scores were regressed on class size and number of writing assignments. These two predictors accounted for just under half of the variance in test scores (R2 = .49), which was highly significant, F(2,289) = 12.5, p=.005. Both the writing assignment (b = .46, p=.001) and the class size (b=.28, p = .014) demonstrated significant effects on the achievement scores.
• t Tests: The 36 study participants had a mean age of 27.4 (SD = 12.6) and were significantly older than the university norm of 21.2 years, t(35) = 2.95, p = 0.01.
• The 25 participants had an average difference from pre-test to post-test anxiety scores of -4.8 (SD = 5.5), indicating the anxiety treatment resulted in a highly significant decrease in anxiety levels, t(24) = -4.36, p = .005 (one-tailed).
• The 36 participants in the treatment group (M = 14.8, SD = 2.0) and the 25 participants in the control group (M = 16.6, SD = 2.5), demonstrated a significant difference in performance (t[59] = -3.12, p = .01); as expected, the visual priming treatment inhibited performance on the phoneme recognition task.
Introductory to SPSSStatistical Package for the Social
Sciences
Developing a coding schemeDeveloping a coding schemeandand
setting up your SPSS data filesetting up your SPSS data fileplease watch this:please watch this:
http://www.youtube.com/watch?v=4mf3OA1ifXI
DEVELOPING A CODING SCHEME DEVELOPING A CODING SCHEME AND SETTING UP YOUR SPSS FILEAND SETTING UP YOUR SPSS FILE
The following steps must be taken in the order presented below:
1. Develop a Coding Scheme:
(Ground Rules to be Followed When Coding the Data)HOW?– By starting with a blank copy of your survey(For an example, see the following Faculty
Unionization Survey)
DEVELOPING A CODING SCHEMEDEVELOPING A CODING SCHEME
(Continued on the next slide)
DEVELOPING A CODING SCHEMEDEVELOPING A CODING SCHEME
44
Developing a coding scheme and setting up Developing a coding scheme and setting up your spss fileyour spss file
The following steps must be taken in the orderpresented below:
1. Develop a Coding Scheme--use a blank survey to:
A. Specify any additional variables that may need to be added to the data set.
B. Identify your reverse items (items that will have to be reverse coded before they are used in any computations
and/or analysis).
DEVELOPING A CODING SCHEMEDEVELOPING A CODING SCHEME
R R R
R
R R
Campus: ISU NIU SIU
ID____
Deadline: Before After
ATTITUDE TOWARD UNIONS:
Liberalism/CONSERVATISM:
46
DEVELOPING A CODING SCHEME DEVELOPING A CODING SCHEME AND SETTING UP YOUR SPSS AND SETTING UP YOUR SPSS
FILEFILEThe following steps must be taken in the order presented below:
1. Develop a Coding Scheme--use a blank survey to:
A. Specify any additional variables that may need to be added to the data set. B. Identify your reverse items (items that will have to be reverse coded before they are used in any computations and/or analysis). C. Assign variable names to all your variables. D. Specify how you wish to code the values of each variable.E. Specify how you will code missing values (e.g., non- responses).
A true non-response is ordinarily coded as a blank.
DEVELOPING A CODING SCHEMEDEVELOPING A CODING SCHEME
uni1uni2
runi3runi4rcon1con2r
con3con4con5r
con6r
con7con8
Campus: ISU=1 NIU=2 SIU=3
ID_____
Deadline: Before=0 After=1
DEVELOPING A CODING SCHEMEDEVELOPING A CODING SCHEME
age
sex
race
marital
rank rankyr
degree
experyr
instyr
leave
tenure
pay
123
456
0 1125
34
0 1
12
34
0 1
123
45
45
123
12
34
1 2 3
49
2. Create your new SPSS FILE and define the attributes of your variables following the instructions in your handouts entitled:“Make sure to follow the ground rules you have established in your coding scheme. That is, to designate:
A. Variable Names
B. Variable Labels
C. Value Labels (e.g., for variable gender, 0=Male, 1=Female this is especially important for categorical variables.)
D. User-Missing Values
E. Variable Types (i.e., numeric or string)
F. Variable Formats (maximum number of digits and decimals required for coding the values)
Developing a coding scheme and setting up Developing a coding scheme and setting up your spss fileyour spss file
G. Column Formats (column width used to display a variable on the screen)
– NOTE: It may be more convenient to assign the same attributes to several variables by copying and pasting the attributes.
3. Code the data into your SPSS file following the ground
rules established in your coding scheme and observed
when creating your SPSS file.
4. Reverse code your reverse items--only after all the coding is
completed (Menu Bar: Transform, Recode into a new var.)
Developing a coding scheme and setting up your Developing a coding scheme and setting up your spss filespss file
5. Compute and print summary/descriptive statistics--e.g., mean, std. dev., min., and max. for metric variables; and frequencies, min., and max. for categorical variables
(Menu Bar: Analyze, Descriptive Statistics)
6. Examine the data and summary statistics (e.g., frequencies, min. & max. values) to spot and correct coding errors.
7. Calculate reliability ONLY for each of your summated multi-item scale and, (a) identify items detracting from reliability, and (b) refine the scale accordingly
(Menu Bar: Analyze, Scale, Reliability Analysis, Statistics--“Inter-item correlations” and “Scale if item deleted”)
Developing a coding scheme and setting up Developing a coding scheme and setting up your spss fileyour spss file
Negatively-keyed items and reverse-scoring
• A self-esteem questionnaire might include an item such as “I like myself”, which is rated on a 5-point likert scale (1 = Strongly Disagree, 2 = Disagree, 3 = Neutral, 4 = Agree, 5 = Strongly Agree). This item is positively-keyed and the item indicat.
• a self-esteem questionnaire might include an item such as “I dislike myself”, rated on the same 5-point scale (1 = Strongly Disagree, 2 = Disagree, 3 = Neutral, 4 = Agree, 5 = Strongly Agree). This item is negatively-keyed, because an Agreement or Strong Agreement with the item indicates a relatively low level of self-esteem and the item indicates a relatively low level of self-esteem
Negatively-keyed items and reverse-scoring• If a questionnaire includes positively-keyed and negatively-keyed items,
then the negatively-keyed items MUST be “reverse-scored” before computing individuals’ total scores and before conducting many psychometric analyses (e.g., reliability analysis).
• To reverse score an item, we transform or re-code the responses so that high “scores” on the item indicate high levels of the attribute being measured (and so that low scores indicate low levels of the attribute). For example, if an individual taking a self-esteem test responded “1” (Strongly Disagree) to the “I dislike myself” item, then we recode this individual’s response to a 5. Thus, the reverse-scored item now has a high score (a 5 instead of a 1), which indicates a high level of self-esteem.
Reverse the coding of a variable in SPSS
• Watch these two videos
• http://www.youtube.com/watch?v=xKJisOMvY54
http://www.youtube.com/watch?v=iQRst7_3bhY
55
8. Create a composite/summated variable for each summated multi-item scale
(Menu Bar: Transform, Compute--Target Variable = a mathematical expression)See handout entitled “SPSS Tutorial: Calculating New Data Values.”(SPSS FILE “UNIONIZATION CODING
EXERCISE_FICTITIOUS DATA”)9. Compute Descriptive Statistics for different subgroups
separately (Menu Bar: Data, Split File, Organize Output by Groups--specify
the grouping variable) OR (Menu Bar: Analyze, Compare Means, Means, Dep. List, Indep. List, Select Options, OK)
10. Save and print your SPSS data file (Menu Bar: File, Save As…)11. Analyze the data (Menu Bar: Analyze--select analysis options)
Developing a coding scheme and setting up Developing a coding scheme and setting up your spss fileyour spss file
• Overview of SPSS• A brief tour• Transforming data
– Recode– Compute– Select If
• Univariate analysis– Frequencies– Descriptives– Explore
Overview of SPSS
• SPSS is a statistical package for beginning, intermediate, and advanced data analysis
• Other statistical packages include SAS and Stata
SPSS Files and Extensions
• Data file -- .sav
• Output file -- .spo
Opening SPSS
• Go to start and find SPSS for Windows
• Click on SPSS 21.0 for Windows to open
• You’ll need to update your SPSS license every year (or your school technician will do it for you)
61
The Workspace
Cases
Variables
Toggle between Data and Variable
Views
Value labels
Creating Your Own Data File
It involves creating:– Variable names– Variable labels– Value labels– Missing values
66
Data Entry (by hand)1. Click Variable View
2. Click the Row 1, Name cell and type Campus (no spaces allowed in name)
67
4. Type 2 for the value and dubai for the label- click Add and then OK
3. Click the Row 1, Values cell and type 1 for the value and abu dhabi for the label- click Add
Data Entry (by hand)
What’s Next?
• Now you know how to open an existing SPSS data file and how to enter data
• Let’s do a quick overview of SPSS and then we’ll learn how to analyze your data
Univariate Analysis
• Now that we know how to open existing files, we’re ready to begin analyzing data
• Univariate analysis refers to analyzing variables one-at-a-time
Types of Univariate Analysis Procedures
• Frequencies
• Descriptives
• Explore
Frequencies• Go to Analyze/Descriptive
Statistics/Frequencies
Bar Charts
• Bar charts – click on Analyze/Descriptive Statistics/Frequencies
• Click on “Charts”
• Select “Bar Charts” and click on “Continue” and then on OK
Histograms
• Click on click on Analyze/Descriptive Statistics/Frequencies
• Click on “Charts”• Select “Histograms” and click on “Continue” and
then on OK
Statistics
• Click on Analyze/Descriptive Statistics/Frequencies
• Click on “Statistics”
• Select the statistics you want and click on “Continue” and then on OK
More Exercises for Frequencies
• Run the frequency distribution for AGE
• Get a histogram for AGE
• Compute the following statistics for AGE:– Mean– Median– Standard deviation– Percentiles – 25th, 50th, and 75th
Descriptives
• Click on Analyze/Descriptive Statistics/Descriptives
• Select AGE and REl
• Click on “Options” and select the statistics you want and then click on “Continue” and OK
Exercises for Descriptives
• Use Descriptives to compute the following statistics for AGE– Mean– Standard deviation– Variance– Skewness– Kurtosis
Explore
• Click on Analyze/Descriptive Statistics/Explore
• Select EDUC and put it in the “Dependent List”
• In the Display box on the lower left, click on “Both”
• Click on OK
Selecting Statistics for Explore
• Click on Analyze/Descriptive Statistics/Explore
• Click on “Statistics” and select the statistics you want
• Click on “Continue” and then OK
Selecting Plots for Explore
• Click on “Plots”
• Select the plots you want
• Click on “Continue” and then OK
top related