inferential statistics
TRANSCRIPT
Outline• What Statistical Tests to Use?• Correlation Tests• t-Tests
• To play around with the data, please download the file: Statistics-Inferential.xlsx Download from https://goo.gl/eY8j6N or http://
www.filehosting.org/file/details/491184/Statistics-Inferential.xlsx
Scan the QR code or
Decision on the Statistical Tests• Depends on
The design of the research• To see the relationship of the variables?• To see if there are any changes in the
participants after certain treatment?• Etc.
Can the results be generalized?• Assumptions – conclusions – actions
Why checking assumptions?• Assumption is important
assumption conclusion action Correct assumption correct conclusion
correct action
Case: I couldn’t meet Ast today at 1.30 PM
• Assumptions 12-1 PM official lunch time in SWCU Everybody needs lunch Classes at FLL usually go from 11 AM – 1 PM then
from 2-4 PM
• Conclusions Every lecturer in SWCU will have lunch at 12-1 PM Every lecturer may teach 11 AM – 1 PM then from
2-4 PM
• Action See Ast between 1-2 PM
But…• Assumptions
Ast hates me for God knows what reasons
• Conclusions He will not see me at all
• Action That’s probably why he refuses to see
me at 1.30 PM today.
How do you know your assumptions are right?
• It’s regulation/convention But are you sure it’s regulated in SWCU and
FLL?
• It’s what usually happens in SWCU and FLL Offices are closed between 12-1 PM Lecturers are seen at campus cafes having
lunch during 12-1 PM Schedule of classes
• Where did your assumption go wrong? How can you be so sure that Ast hates you?
What has Ast to do with ResMeth?• Assumption must be correct,
otherwise the conclusion will not be correct
• What made your conclusion wrong in the case of Ast? Feelings and not what NORMALLY
happens either by regulation/convention in the POPULATION (SWCU/FLL)
• Remember NORMAL DISTRIBUTION?
Looking back at previous meetings…• The aim of doing quantitative research is to
generalize the results for the population• Assumption
Population normal distribution Sample normal distribution
• Conclusion If my sample is normally distributed, I can expect
to generalize it to the population
• Action My research recommendations can be applied in
the population
Parametric vs. Non-Parametric Tests• Some statistical tests are parametric tests
based on the normal distribution• A parametric test requires parametric data
from one of the large catalogue of distributions that statisticians have described (regulation/convention)
• Parametric data certain assumptions must be true. A parametric test for NON parametric data
inaccurate results
• very important check the assumptions before deciding which statistical test is appropriate
• Positively related one up, the other up
• Not related at all same no matter what
• Negatively related one up, the other down
How 2 variables could be related?
Correlational Tests• Parametric Test
Pearson’s Product Moment Correlation
• Non-Parametric Spearman’s Correlation Coefficient Kendall’s tau (τ)
• To decide: Check the assumptions 1 assumption violated non-parametric
What are the underlying assumptions?1. Related pairs 2. Scale of measurements3. Normality4. Linearity5. Homoscedasticity
Testing: 1 & 2 design of the research3-5 testable using graphic & tests
Related Pairs• Data must be collected from related
pairs• 1 data from one variable, 1 data from
the other variable• E.g. Relationship between gender
and English competence Arif has data for gender “male” and for
English competence “84 points”
Scale of Measurements• Interval or ratio• Do you still remember what they are?
Continuous Not categorical
• E.g. Arif Gender nominal (categorical) Competence ratio (continuous)
• One assumption violated! Go to non-parametric (Spearman’s or
Kendall’s)
Warning!• Difference in literature
Coakes (2005) both variables must be continuous - interval
Field (2009) interval or one variable can be categorical – binary
• I’m inclined to Coakes The scatterplot when one variable is
interval and the other is binary is not homoscedasticity (I’ll show you later why this matters)
Normality• In MSExcel – (complicated!)
Histogram
46 47 52 74 79 810
2
4
6
8
10
12
14
Series1Polynomial (Series1)
Normality & Linearity• In SPSS (relatively easier)
Together with descriptive statistics report & linearity
• Test by: Graphic Normality tests
Normality and Linearity• Analyze | Descriptive Statistics |
Explore Select the variable you want to test Statistics: tick• Descriptives
Plots: tick• Histogram• Normality plots with tests
Normality• From Kolmogorov-Smirnov (K-S) & Shapiro-
Wilk (S-W)
Sig. <.05 significantly different from normal distribution
competence sig. = .008 <.05 data not normal Shapiro – Wilk is more powerful (maybe K-S sig,
S-W not sig.)
Normality• Graphic – Histogram
not bell-shaped not normal
• Psst.. Normality line here is added as a guide. How? Try right
clicking the graphic & edit the content. Find this icon in the bar:
Normality• Is your data normally distributed?
46 47 52 74 79 810
2
4
6
8
10
12
14
Series1Polynomial (Series1)
Linearity• How your data
for each variable falls in a linear line
• MS Excel – not possible
• SPSS – yes! See the test of
normality
Homoscedascity• How your data clustered into
certain areas when two variables are related
• To see if they have similar variance along the linear line
• Why this is important? Not wide difference between data Too wide --> not normal
Homoscedasticity• MS Excel – not possible• SPSS – yes!
Graph | Legacy Dialogs | Scatter/Dot | Simple Scatter
Choose the two variables for X axis and Y axis
• Psst.. Linear line here is added as a guide. How? Try right clicking the
graphic & edit the content. Find this icon in the bar:
HomoscedasticityGender vs.
Competence• Heteroscedasticity• Not normal
Competence vs. Graduation
• Homoscedasticity• Maybe normal
Can’t do categorical variable! Coakes wins!
Once you’ve done all of this assumption checking…• Select the correlational test the data falls
into• Our correlational tests are bivariate
correlation Between 2 variables
• We’re not dealing with partial correlation (between 2 variables plus one or more controlling variables) later when you’re more ‘grown up’ in statistics
• Pearson product-moment correlation (standardized measurement) Symbol : r or R -1 to +1 To measure size of the effect• ± 0.1 small effect• ± 0.3 medium effect• ± 0.5 large effect
•
How do we measure relationships?
Pearson’s Correlation Coefficient• Using MS Excel – Data | Data Analysis |
Correlation
• Downsides Only for Pearson’s, not Spearman’s or
Kendall’s No indicator of significance of relationship Only the strength of correlation coefficient
Competence Graduation
Competence 1
Graduation 0.954149422 1
• Analyze | Correlate | Bivariate
• Input the variables used in Variables
• Default: Pearson• Options: Spearman
and Kendall• One- vs. two-tailed
One-tailed directional hypothesis (the more x, the more y)
Two-tailed not sure
Bivariate Correlation (Using SPSS)
• Interpretation of the result table ** significant
correlation r value Pearson
Correlation value Significant or not
Sig. <.05
• What does this numbers mean?
Pearson’s Correlation Coefficient
• Correlation result ≠ causality• Third-variable problem
Maybe there is an influence of third variable
• Direction of causality No clear indication which variable
causes the other variable to change
Warning: Causality!!!
• Non-parametric statistic Not normal data distribution, etc. Not interval data ordinal data
• Interpretation of the result table ** significant correlation rs -- Correlation coefficient value Significant or not Sig. <.05
Spearman’s Correlation Coefficient
• Non-parametric statistic Small data set which when it is ranked it
has many scores with the same rank More accurate generalization than
Spearman’s
• Interpretation of the result table ** significant correlation τ – Correlation coefficient value Significant or not Sig. <.05
Kendall’s tau (τ)
• Tell: How big Significant value
• Important Notes: No zero before the decimal point for correlation
coefficient (for example -- .87 NOT 0.87) Correlation coefficient in different letters (r, rs, or τ)
One-tailed must be reported Standard criteria for p value (probabilities)
-- .05, .01 and .001
How to Report Correlation Coefficients
• Pearson’s There is a significant correlation between X
variable and Y variable, r = .87, p (one-tailed) <.05
• Spearman’s X variable is significantly correlated with Y
variable, rs = .87 (p <.01)
• Kendall’s There was a positive relationship between X
variable and Y variable, τ = .47, p<.05
Example of Reports
What is it for?• Looking at the effect(s) of one
variable to another• By systematically changing some
aspect of that variable• To compare two means of the data
Comparing 2 means of data• Between-group, between-subjects or
independent design DIFFERENT participants to different
experimental manipulations
• A repeated-measures design SAME participants to different
experimental manipulations at different points in time
Comparing 2 Means Using t-Tests
Different participantsBetween groups, between subjects, or independent
design
Single Sample
From one sample compared to the
population
Test scores of a group in a semester compared to previous group’s scores
Independent or Two- Sample
Two samples with different conditions
Test scores of 2 groups with different teachers
after a semester
Same participants Repeated measures
design
Paired- or Dependent sample
From two samples of the same condition
The scores of a group before and after a
semester
Assumptions of the t-tests1. Scale of Measurement – continuous
interval2. Random sampling 3. Normality 4. Additional for Independent t-test
1. Independent of groups – inclusion into one group only, and not the other group
2. Homogeneity of variance – Levene’s test (presented in SPSS results for independent t-test)
Single Sample t-Test• Comparing the mean of
a data set with a set means of other aggregate data
• MS Excel no!• SPSS Analyze |
Compare Means | One Sample t-Test Input the Test Variable
compared Input the Test Value
(aggregate data)
Single Sample t-Test: Results & Report
• Reporting:There is no significant difference in the graduation grade between this year’s participants with previous year’s participants ( t(19) = .493, p>.05), although this year’s participants have slightly higher grade (Mean Difference = 1.4)
Significant sig. <.05t positive this data > previous aggregate data
Using MS Excel for Other t-Tests• Only for
Paired-sample T-Test Independent T-Test• Assuming equal variance• Assuming non-equal variance
Reject or accept the null hypothesis there is no difference of means in the two variables
Paired-Samples t-Test• Comparing the means of the same
group participants under two conditions
• Samples two sets of data, but paired (from the same participants)
• E.g. The pre-test vs. post-test scores of a group participants
• E.g. The scores of a group participants after being taught using picture vs. film
Paired-Sample t-Test in MSExcel• H0 = there is no difference
between the two groups• Data | Data Analysis | t-
Test: Paired two Sample for Means | Select Variable 1 & 2 | Select Output Range
• P (T<=) two-tail <t Critical two-tail = reject H0 What’s the result?
• t Stat is minus
the pre (competence) <the post (graduation)
Paired-Samples t-Test in SPSS• Analyze |
Compare Means | Paired-Samples T-Test | Input the two variables
Results• Paired-Samples Statistics• Paired-Samples Correlations
Pearson’s r and sig. (r see effect, significant <.05)
• Paired-Samples Test Mean = difference of means between groups t value = minus first variable has smaller
mean df = sample size – 1 (degree of freedom) Sig. = significant p <.05
ResultsPearson’s rsignificant sig. <.05Correlation size of
effect
significant sig. <.05t minus first variable has smaller mean
Reporting on Results
On average, the participants has significantly higher scores on variable graduation grade (M= 71.40, SE = 2.001), than on variable competence score (M= 67.95, SE = 2.328, t(19) = .00, p<.05) with large effect r = .954) Legend
• M – mean• SE – standard error• t (19) – df• r – this formula (large effect)
Independent T-test• Compare the means of two groups’
participants in two different conditions• The groups are independent of each other
MS Excel – always assume unequal variances or do F-Test Two Sample for Variance to decide if they are equal/unequal, then choose appropriate independent t-test
SPSS -- checked using Levene’s test in the results of independent t-test
• E.g. the scores of two groups’ participants after being taught using pictures vs. film
Independent T-test using MSExcel• Data | Data Analysis |
t-Test: Two-Sample Assuming Unequal Variances | Select Variable 1 & 2 (by group) | Select Output Range
• H0 = there is no difference between the two groups
• P (T<=) two-tail <t Critical two-tail = reject H0 What’s the result?
• t Stat is minus Pictures group < film group
Independent T-test Using SPSS• Analyze |
Compare Means | Independent-Samples t-Test | Insert the test variable & grouping variable
Results • Group Statistics• Independent Samples Test
Homogeneity of Variances using Levene’s test – should be NOT significant (groups are similar) sig >.05 See sig. of equal variances assumed (otherwise See not assumed)
Mean = difference of means between groups t value = minus first group has smaller mean df = sample size – 1 (degree of freedom) Sig. = significant p <.05
Results
Sig. > .05 group is similar (good!) equal variances assumed
significant sig. >.05Mean Difference minus first group has smaller mean
Reporting on Results• On average, participants that were
taught using film had higher scores (M=72, SE=2.921), than those taught using pictures (M=70.80, SE=2. 878). This difference was not significant t(18)=-.773, p>.05.
• Legend – same as in dependent t-test
Confused?• Ask now • Ask me – F 505 by appointments• Email me – [email protected]• Twit me -- @nenyish• This presentation file is available at: