introductory presentation outline: statistics · pdf filehypothesis testing: ... parametric -...

24
Introductory Statistics Presentation Outline: Types of statistics Statistical test definitions Simple statistical tests

Upload: phungthuy

Post on 13-Mar-2018

216 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Introductory Presentation Outline: Statistics · PDF fileHypothesis testing: ... Parametric - assumes a normal distribution based on ... Contingency tables = frequency tables = cross-tabulation

Introductory Statistics

Presentation Outline:

Types of statistics

Statistical test definitions

Simple statistical tests

Page 2: Introductory Presentation Outline: Statistics · PDF fileHypothesis testing: ... Parametric - assumes a normal distribution based on ... Contingency tables = frequency tables = cross-tabulation

What are statistics?

One way to describe statistics is as a set of

scientific techniques used for learning in the

presence of variation

Statistical measures such as P-values and

Confidence Intervals, help to quantify how

much we can learn from a sample of data

Page 3: Introductory Presentation Outline: Statistics · PDF fileHypothesis testing: ... Parametric - assumes a normal distribution based on ... Contingency tables = frequency tables = cross-tabulation

There are two types of statistics – Descriptive Statistics

Concerned with presentation, organization, and summarization

of data

Give a lot of VERY important information about data prior to

performing Inferential statistics (%’s, means, confidence

intervals)

Inferential Statistics

Used to make inferences about the characteristics of the

population from the characteristics of a random sample drawn

from the population

Hypothesis testing: using data samples to establish the

credibility of a theory about the population

P values are calculated from the different inferential statistical

tests to confirm the study hypothesis

Page 4: Introductory Presentation Outline: Statistics · PDF fileHypothesis testing: ... Parametric - assumes a normal distribution based on ... Contingency tables = frequency tables = cross-tabulation

Inferential Statistics Inferential statistics are then divided into two more

categories:

Parametric - assumes a normal distribution based on

population means and standard deviations

• Interval/Integer (pain level, temperature)

• Ratio (weight-Body Mass Index)

Nonparametric - make no assumptions about the nature

of the distribution underlying the data; these statistics are

not distribution free, we do not know what the distribution

looks like

• Nominal (gender, ethnicity)

• Ordinal (tumor position)

Page 5: Introductory Presentation Outline: Statistics · PDF fileHypothesis testing: ... Parametric - assumes a normal distribution based on ... Contingency tables = frequency tables = cross-tabulation

Inferential Statistics cont’d.

The type of inferential statistical test performed is driven by the

type of data being analyzed

Types of Data

Categorical / Nominal data consists of named categories with

no implied order among the categories.

Ordinal / Rank data consists of ordered categories, where the

differences between categories cannot be considered to be

equal.

Continuous data may take any value within a defined range and

assumes equal distances between values.

Page 6: Introductory Presentation Outline: Statistics · PDF fileHypothesis testing: ... Parametric - assumes a normal distribution based on ... Contingency tables = frequency tables = cross-tabulation

Choosing an appropriate test

Assumptions

inferential tests have certain assumptions that

you should be familiar with before you use a

test

violating the assumptions misleading results

Considerations for choosing a test

• Variables

• Distribution

• Parametric and non-parametric

Page 7: Introductory Presentation Outline: Statistics · PDF fileHypothesis testing: ... Parametric - assumes a normal distribution based on ... Contingency tables = frequency tables = cross-tabulation

The type of research design and type of data will ultimately drive the appropriate statistical test used (if all assumptions of the statistical test are met).

Type of Statistical Test

Type of Data

Continuous Ordinal Categorical

Type of Design Parametric Non-parametric

Compares 2 Independent groups Independent t-test Wilcoxon-Mann-

Whitney test Chi-square test (r x 2)

Compares 3 Independent groups One-way ANOVA Kruskal-Wallis test Chi-square test (r x k)

Compares pre and post in the same

sample size Paired t-test

Wilcoxon signed rank

test McNemar Change test

Compares multiple measures in the

same sample size

Repeated Measures

ANOVA Friedman test Cochran Q test

Correlation between two variables Pearson Correlation Spearman Correlation Kappa coefficient

To model which variables predict an

outcome Multiple Regression

Multinomial Logistic

Regression Logistic Regression

Page 8: Introductory Presentation Outline: Statistics · PDF fileHypothesis testing: ... Parametric - assumes a normal distribution based on ... Contingency tables = frequency tables = cross-tabulation

Chi-Square Test (Pearson’s, goodness-of-fit)

Underlying concept: do the observed frequencies

differ from the expected frequencies?

If H0 (NULL) is true: expected = observed

If HA (ALTERNATIVE) is true: expected ≠

observed

Design is represented by contingency tables (often

stated in % age, but count is level of analysis)

Contingency tables = frequency tables = cross-

tabulation tables

Page 9: Introductory Presentation Outline: Statistics · PDF fileHypothesis testing: ... Parametric - assumes a normal distribution based on ... Contingency tables = frequency tables = cross-tabulation

Generic Contingency Table

For a 2 x 2 contingency table the Chi-square statistic is calculated

by the formula:

Just like with the t-test, the computation will result in a test

statistic and the associated p-value that will allow discussion of

the group differences.

A B

C D

A + C B + D

A + B

C + D

Page 10: Introductory Presentation Outline: Statistics · PDF fileHypothesis testing: ... Parametric - assumes a normal distribution based on ... Contingency tables = frequency tables = cross-tabulation

Fischer’s Exact Test

Similar to the chi square test, but is preferred with:

Smaller sample sizes

Severely unequal cell distribution

Cells with an expected frequency of < 5 OR 10

Page 11: Introductory Presentation Outline: Statistics · PDF fileHypothesis testing: ... Parametric - assumes a normal distribution based on ... Contingency tables = frequency tables = cross-tabulation

T-test

Independent t-test / Student’s t-test – compares

continuous data (means) between 2 independent

groups (most robust of all statistical tests)

Paired t-test – compares continuous data (means)

between 2 dependent / matched / paired groups

Page 12: Introductory Presentation Outline: Statistics · PDF fileHypothesis testing: ... Parametric - assumes a normal distribution based on ... Contingency tables = frequency tables = cross-tabulation

ANOVA 3 or more groups

Multiple repeated measures

Within and between subject designs (also MIXED

design)

Study Design (one-way ANOVA analysis)

• Group A: full dose of drug ‘wonderful’

• Group B: half-dose of drug ‘wonderful’

• Group C: placebo

• t-test = 4 tests, ANOVA = 1 test to tell if a difference

exists

Page 13: Introductory Presentation Outline: Statistics · PDF fileHypothesis testing: ... Parametric - assumes a normal distribution based on ... Contingency tables = frequency tables = cross-tabulation

Two-way ANOVA

Two independent variables (IV) (Example: 2x2, DV – BMI)

Main effect of Diet: Yes (A+B) vs. No (C+D)

Simple Main Effect of Diet Yes: A vs. B

Interaction: A by B by C by D

(weight loss) DRUG

Diet Plan

YES NO

YES 20 (A) 26 (B)

NO 23 (C) 29 (D)

Page 14: Introductory Presentation Outline: Statistics · PDF fileHypothesis testing: ... Parametric - assumes a normal distribution based on ... Contingency tables = frequency tables = cross-tabulation

Repeated Measure ANOVA

Within subject design only or within –between

design

Every subject is exposed to each level of an

IV

Most common example: time

All other types of repeated measure are less

common and not appropriate for many

questions

Tip: avoid order effects

Page 15: Introductory Presentation Outline: Statistics · PDF fileHypothesis testing: ... Parametric - assumes a normal distribution based on ... Contingency tables = frequency tables = cross-tabulation

Measures of Association Pearson’s Correlation

A measure of the strength of association between 2

variables

Ranges from -1 to +1,

Correlation coefficient is r

-/+ signs indicate the direction of the relationship

≠ causation

0.10 (small), 0.30 (medium), 0.50 (large)

Spurious/illusory correlations

Statistical significance does not mean the results are

clinically significant

Page 16: Introductory Presentation Outline: Statistics · PDF fileHypothesis testing: ... Parametric - assumes a normal distribution based on ... Contingency tables = frequency tables = cross-tabulation
Page 17: Introductory Presentation Outline: Statistics · PDF fileHypothesis testing: ... Parametric - assumes a normal distribution based on ... Contingency tables = frequency tables = cross-tabulation

Confidence Intervals (CIs)

A range of values within which a researcher can

say with a certain degree of confidence that a

population parameter will fall.

Originally designed to analyze a sample of samples, but

is now used on one sample

Useful when the mean is uncertain do to conflicting

results

Meta-analysis

Used to test non-inferiority and superiority

Provides confidence without specificity

Page 18: Introductory Presentation Outline: Statistics · PDF fileHypothesis testing: ... Parametric - assumes a normal distribution based on ... Contingency tables = frequency tables = cross-tabulation

CIs and Hypothesis Testing

Page 19: Introductory Presentation Outline: Statistics · PDF fileHypothesis testing: ... Parametric - assumes a normal distribution based on ... Contingency tables = frequency tables = cross-tabulation

Factors That Create Misleading Results

Restricted range tends to reduce r

Nonlinear relationships – cannot use

Pearson’s correlation

X and/ or Y have skewed distribution –

underestimate r

Outliers – over- or underestimate r

Extreme groups – overestimate r

Page 20: Introductory Presentation Outline: Statistics · PDF fileHypothesis testing: ... Parametric - assumes a normal distribution based on ... Contingency tables = frequency tables = cross-tabulation

Non-Normal distribution Spearman rank correlation

Uses the same ranking principle as the Mann-Whitney

Same characteristics as Pearson’s r

• Ranges from -1 to +1,

• Correlation coefficient is rs

• -/+ signs indicate the direction of the relationship

• ≠ causation

• Spurious correlation

• Statistical significance does not mean clinically

significant

No correlation for dichotomous data (chi-square)

Page 21: Introductory Presentation Outline: Statistics · PDF fileHypothesis testing: ... Parametric - assumes a normal distribution based on ... Contingency tables = frequency tables = cross-tabulation

Non-parametric group comparisons Mann-Whitney U Test

Alternative to the independent t-test when normal

distribution is severely violated

Converts raw scores to ranks

Compares ranks between groups to determine if there is

a difference

Wilcoxon Signed-Ranks Test

Alternative to the dependent t-test when normal

distribution is severely violated

Compares rankings of difference scores

Page 22: Introductory Presentation Outline: Statistics · PDF fileHypothesis testing: ... Parametric - assumes a normal distribution based on ... Contingency tables = frequency tables = cross-tabulation

References UCLA

http://www.linguistics.ucla.edu/faciliti/facilities/statistics/power.htm

The Florida State University

http://stat.fsu.edu/undergrad/statinf2.php

About.com Sociology

http://sociology.about.com/od/Statistics/a/Descriptive-inferential-

statistics.htm

University of South Carolina

http://www.usca.edu/polisci/apls301/Text/Chapter%2012.%20Significance

%20and%20Measures%20of%20Association.htm

Boston University School of Public Health

http://sphweb.bumc.bu.edu/otlt/MPH-

Modules/BS/BS704_Nonparametric/BS704_Nonparametric2.html

Previous internal Advocate research department presentations

Page 23: Introductory Presentation Outline: Statistics · PDF fileHypothesis testing: ... Parametric - assumes a normal distribution based on ... Contingency tables = frequency tables = cross-tabulation

Presentation Feedback

Thank you for your review.

If you would like to provide feedback on the content of the

presentation, please complete the short survey which can be

found at this link: Presentation Evaluation Survey

Please note the survey should not take more than 5 minutes to complete.

Thank you in advance for completing the survey!

Page 24: Introductory Presentation Outline: Statistics · PDF fileHypothesis testing: ... Parametric - assumes a normal distribution based on ... Contingency tables = frequency tables = cross-tabulation

Thank You!