section i. statistics

Section I. Statistics

What do they mean and why are they important?

• To be an intelligent consumer of statistics, your first reflex must be to question the statistics that you encounter. The British Prime Minister Benjamin Disraeli famously said, "There are three kinds of lies -- lies, damned lies, and statistics."

• It is important to think about the numbers, their sources, and most importantly, the procedures used to generate them.

What do stats mean?

• Weather forecasts• Emergency preparedness• Predicting disease• Medical studies• Genetics• Political campaigns• Insurance• Consumer goods• Quality testing• Stock market

Top 10 ways you use statistics every day

• Six good reasons to study statistics – to be able to effectively conduct research, – to be able to read and evaluate journal articles, – to further develop critical thinking and analytic skills,– to act as an informed consumer, – and to know when you need to hire outside statistical help. – Even Florence Nightingale did it!

But I’m never going to do research!

• Increasing emphasis on evidence based practice

– Informs nurses’ decisions and actions– Empowers nurses to make clinical decisions which

benefit their patients, whether individual or community

– Friendly nursing research environment required for Magnet status

– Increases recognition for nursing contribution in health care and policy

Why nursing research

• The characteristics we are measuring– Varies according to the population, patient, event,

intervention• Data levels of measurement help us measure

the variables– Nominal– Ordinal– Interval– Ratio

Variables

• sometimes called categorical or qualitative

– Permissible statistics: mode, chi-squared– Lowest form of data, least sophisticated

• Names • Characteristics/Descriptive (i.e. pain - throbbing, stabbing,

dull)• Letters (i.e. M/F, Y/N)• Numbers may be assigned to designate categories but have

no numerical meaning (i.e. M=1, F=2)

Data levels of measurement: Nominal

– Permissible statistics: median, percentile– Can’t be added

• Rank order –1st, 2nd, 3rd

• Rating–Pain rating 0-10

• Likert scale

Data Levels of measurement: Ordinal

• Dissatisfied, somewhat dissatisfied, neither satisfied nor dissatisfied, somewhat satisfied, very satisfied– No numerical data to quantify – Answers run on a continuum

Likert scales

• Permissible statistics: mean, SD, correlation, regression, ANOVA– Rank ordering of objects.– Equivalent distance between each measurement– The Fahrenheit scale is a clear example of the

interval scale of measurement– Arbitrary zero does not represent the lowest value

Data Levels of measurement: Interval

• Highest level of measurement• Permissible statistics: same as interval plus more • The ratio scale of measurement is similar to the interval

scale in that it also represents quantity and has equality of units.

• has an absolute zero (no numbers exist below zero). Very often, physical measures will represent ratio data (for example, height and weight). Example: measuring a length of a piece of wood in centimeters: you have quantity, equal units, and the measure can’t go below zero centimeters.

Data Levels of measurement : Ratio

Subject Ratio level Interval level Ordinal level Nominal level

Cookie 180 70 6 2

Bunny 110 0 1 1

Frosty 165 55 4 2

Tootsie 130 20 3 1

Candy 175 65 5 2

Fluffy 115 5 2 1

Examples of data levels of measurement

• The colors of M&M candies would be which type of measurement?

A. IntervalB. NominalC. OrdinalD. Ratio

Question 1

• Height, weight, lab test results, and age are examples of which type of data measurement?

A. RatioB. NominalC. IntervalD. Ordinal

Question 2

• The Rankin scale is used to assess functional status after stroke. Measurements are:

• 0 = no symptoms at all• 1 = symptoms with no significant disability• 2 = slight disability; unable to carry out previous activities• 3 = moderate disability; needs some assistance, can walk alone• 4 = moderately severe disability; unable to walk or attend bodily functions

without assistance• 5 = severe disability; bedridden, incontinent, needs constant nursing care• 6 = dead

Rankin Scale

• The Rankin scale is which type of measurement?

A. RatioB. NominalC. IntervalD. Ordinal

Question 3

Section II. Descriptive Statistics and Intro to the Normal Distribution

Descriptive Statistics= Describing the Data

• For any study, consider what parts would be useful to describe in numbers– Sample– Variables of interest

• In any study where the data are numerical, data analysis should begin with descriptive statistics.

• The appropriate choice of descriptive statistics depends on the level of data that was collected!

Types of Summary Statistics

• Frequency distributions – Ungrouped– Grouped– Percentages

• Measures of central tendency• Measures of dispersion

Ungrouped Frequency Distributions

• The number of times something happened.• Used with categorical data (ordinal, nominal)• As simple as a tally or count

http://www.gigawiz.com/histograms.html

Example

• Using ungrouped frequency distributions to describe research variables

• How often newborns fit each demographic criteria or birth attendant reported a particular behavior (ex. using CHG vs. not)

From Rhee et al. (2008). Maternal and birth attendant hand washing and neonatal mortality in Southern Nepal. Archives of Pediatrics and Adolescent Medicine, 162(7), 603-608

Grouped Frequency Distributions

• The number of times something happened.• Used to break continuous data (often things like

age, weight, income) into groups.– You will always loose some information by doing this– There are conventions for groupings

• Groups ideally have equal ranges but may see open ended at ends of data spectrum

• All data points must fit into a group• Not too many, not too few (you don’t want to loose

patterns in the data)

Percentage Distributions

• What percentage of the time something happened.– Useful when comparing to studies with different

numbers of participants– Often presented with other frequency

distributions in the following format: No.(%)– Often graphically represented using pie charts, bar

charts

Example

• Questionnaires given to parents of under-immunized children.

• The tables indicate the number and percentage of participants selecting each response.

Luthy, K., Beckstrand, R., & Peterson, N. (2009). Parental hesitation as a factor in delayed Childhood Immunization

Question

• Which measure of central tendency is being used here to summarize participant’s age:– A- Mode– B- Median– C- Mean– D- Standard deviation

Measure of Central Tendency

• Used to describe a “typical” result or the middle of the dataset

• Most common measures:– Median– Mode– Mean

Median

• Literally the number in the middle of the dataset (odd # scores)– 50% of scores above and 50% of scores below this point (known

as the 50th percentile)• Most appropriately used for ordinal data • Because focus is on middle score, the median is less

affected by outliers

Mode

• The most common score(s)– May or may not be in the “middle” but is always a

number in the dataset– Most appropriate for nominal data (ex. Most

answers are “yes”).

Mean

• = Sum of Scores / Total # of Scores– Also known as an average

• Data must be continuous to generate a mean (interval and ratio level data only!)

• Most affected by outliers• May be denoted in a number of ways (M, X

mean)

Measures of Variance

• How spread out is the data? Or how different are the scores from one another?– Range

• Subtract the lowest number from the highest number in the set. Tells the total distance between ends of the data set.

– Variance (interval or ratio levels only!)• Computed mathematically and provides data on dispersion or

spread– Standard deviation (interval or ratio levels only!)

• Relates dispersion of values to the mean• Is an average of variance• Usually reported as SD

Normal Distribution

• In a true normal distribution, the mean, median, and mode are equal• No real distribution exactly fits• However, in most sets of data, the distribution is similar to the normal curve

Normal Distribution•Unique properties

All possible values fall under the curve

Probability of any score occurring is related to its location under the curve

•Important SDs: 68.3% of all values

within 1 SD from mean 95.5% within 2SD from

mean 99.7% within 3 SD from

mean

+/- 1 SD

+/- 2 SD

Section III.Stat theoryHypotheses

Type 1 and 2 ErrorsLevel of Significance

Power

Probability Theory (p values)• Deductive• Used to explain:

– Extent of a relationship– Probability of an event occurring– Probability that an event can be accurately

predicted• Expressed as lowercase p with values

expressed as percents

Probability

• If probability is 0.23, then p = 0.23.• There is a 23% probability that a particular

event will occur.• Probability is usually expected to be p <

0.05.

• Example?• Patients who cardiac arrest in the operating

room have a 5% chance of death.

Decision Theory

• Inductive reasoning• Assumes all groups in a study are the

same• Up to the researcher to provide evidence

(NEVER use the words PROVE!) that there really is a difference

• To test the assumption of no difference, a cutoff point is selected before analysis.

Hypothesis

• Statement of the expected outcome

• Example?• Nursing students who study in the library

have higher GPAs than nursing students who study in their dorm rooms/apartments.

Characteristics of a Hypothesis• Testable• Logical• Directly related to the research problem• Theoretically or Factually based• States relationship between variables• Stated so that it can be accepted or rejected

Research Hypothesis• Directional

– explains and predicts the direction and existence of a specific relationship

– relationship will be either positive or negative– more specific than the non-directional

hypothesis– cause-and-effect hypothesis

• Non - Directional

Null hypothesis

• Statistical statement that there is no difference between the groups under study

Cutoff Point

• level of significance or alpha (α)

• Point at which the results of statistical analysis are judged to indicate a statistically significant difference between groups

• For most nursing studies, level of significance is 0.05.

Cutoff Point (cont’d)

Absolute

NO “CLOSE ENOUGH” - If value is only a fraction above the cutoff point, groups are from the same population.

Results that reveal a significant difference of 0.001 are not considered more significant than the cutoff point.

Inference

A conclusion/judgment based on evidence

Judgments are made based on statistical results

Statistical inferences must be made cautiously and with great care

Generalization

• A generalization is the application of information that has been acquired from a specific instance to a general situation.

• Example?

Normal CurveA theoretical frequency distribution of all

possible values in a population.Levels of significance and probability are

based on the logic of the normal curve.

Normal Curve

One-Tailed Test (cont’d)

Two-Tailed Test

Type I and Type II Errors

Type I error occurs when the researcher rejects the null hypothesis when it is true.The results indicate that there is a significant

difference, when in reality there is not.

Type II error occurs when the researcher regards the null hypothesis as true but it is false. The results indicate there is no significant

difference, when in reality there is a difference.

Reasons for Errors

• Type I– Greater @.05 level

than .01

• Type II– Greater @.01 level than

.05– Flaws in research

methods• Multiple variables

interact• Precision of instruments• Small samples

Statistical Power (AKA Power Analysis)

• DEF: the probability of rejecting the null hypothesis when it should have been rejected OR

• Probability that a statistical test will detect a significant difference that exists

Power

• Maneuver to increase control over:

– Types of errors

– CORRECT DECISIONS

Power and Risk for Type II Error

Power analysis = 0.80 minimum

Influenced by sample sizeAs sample increases so does power

Influenced by effect size – degree to which a phenomenon is present in a populationThe larger the true difference between the two

groups the greater the power

Question #1The level of significance usually set in nursing studies is at either:

a. .5 or .1b. .05 or .01c. .005 or .001

Question #2

Which of the following is TRUE about the level of significance?

a. ensures that findings will be correct 95% of the time if an alpha value was less than .05 was used

b. refers to a statistic calculated during computer analysis

c. represents the risk the researcher is willing to take in making a type I error and is established before data is analyzed

Question #3

There is a greater risk of a Type I error with a 0.05 level of significance than with a 0.01 level of significance.A. TrueB. False

•Statistical Significance

•Clinical Significance

•Reliability

•Validity

•Generalizability & Inference

Section IV.

Statistical Significance• Known as the Alpha ()

• The threshold at which statistical significance is reached.

Cut Off Point

• Referred to as level of significance or alpha (α)• Point at which the results of statistical analysis

are judged to indicate a statistically significant difference between groups

• For many nursing studies, level of significance is 0.05.

• Typically written as α = 0.05

Cutoff Point (cont’d)

• The cutoff point is absolute.

• If the value obtained is only a fraction above the cutoff point no meaning can be attributed to the differences between the groups.

Levels of Acceptable Significance

• 0.05 • 0.01• 0.005• 0.001

Clinical Significance

• Findings can have statistical significance but not clinical significance.

• Related to practical importance of the findings• No common agreement in nursing about how

to judge clinical significance– Difference sufficiently important to warrant

changing the patient’s care?

Clinical Significance (cont’d)

• Who should judge clinical significance?– Patients and their families?– Clinician/researcher?– Society at large?

• Clinical significance is ultimately a value judgment.

Simpson & James (2005) Effects of Immediate Vs. Delayed Pushing During Second-Stage Labor….

Significance differences between groups: Fetal oxygen desaturation during second stage labor (immediate: M=12.5; delayed: M=4.6), p = .001Variable decelerations in fetal heart rate (immediate: M=22.4; delayed: M=15.6), p = .02 There were no differences in length of labor, method of birth, Apgar scores, or umbilical cord gases.

Question: A statistically significant finding means that:

a. Findings are clinically important and valuable.b. Interventions should be used in clinical practice.c. Obtained results are not likely to have been due

to chance.d. Results will be the same if the study is repeated

with another sample.

Question: A researcher reports that the results of a study were not statistically significant. How is this to be interpreted?

a. Intervention was not strong enough to make a difference.

b. Researcher does not have enough evidence to reject Ho.

c. Researcher’s logic or conceptualization in setting up the study was faulty.

d. Topic is of no further interest to nurse researchers or clinicians.

Testing Reliability of Measurement

• Examine reliability of study scales before using them.

• The degree of consistency with which an instrument measures a construct.

Reliability Coefficient

• A quantitative index• Usually ranges from .00 to 1.00• Provides an estimate of how reliable an

instrument is • Should be at least 0.70• Most common one is Cronbach’s alpha

Hollen et al. (1994) Measurement of QOL in patients with.…Psychometric assessment of the

LCSS.

LCSS has good reliability• Internal consistency of = 0.82• High reproducibility/stability (test-retest reliability (n=52, r>0.75)• High repeated inter-rater agreement /equivalence among experts (95-100% agreement)

Validity

1. The degree to which inferences made in a study are accurate = Internal Validity

2. The degree to which results can be generalized = External Validity

3. The degree to which an instrument measures what it is intended to measure = Validity

Hollen et al. (1994) Measurement of QOL in patients with.…Psychometric assessment of the LCSS.

Validity has been established for the LCSS

• Content validity ~ expert panel• Convergence validity ~ similar QOL tool• Construct validity ~ unrelated tools• Criterion-related validity ~ correlation with a

“gold” standard (e.g. Sickness Illness Profile)

Inference

•A conclusion or judgment based on evidence

•Judgments are made based on statistical results

•Statistical inferences must be made cautiously and with great care

Generalization

• A generalization is the application of information that has been acquired from a specific instance to a general situation.

• Generalizing requires making an inference.

• Both inference and generalization require the use of inductive reasoning.

Generalization (cont’d)

• An inference is made from a specific case and extended to a general truth, from a part to a whole, from the known to the unknown.

• In research, an inference is made from the study findings to a more general population.

Simpson & James (2005) Effects of Immediate Vs. Delayed Pushing During Second-Stage Labor….

“Results from this study suggest that delayed second-stage pushing until the urge to push and pushing with the open-glottis technique in nulliparous women with epidural anesthesia is more favorable for physiologic fetal well-being as measured by FSpO2 (p. 155).”

“The benefits of less fetal oxygen desaturation ….appear to outweigh any disadvantages of a longer second stage (p. 155).”

Question: Which of the following questions relates to generalization?

a. Are the findings generally significant to people in the study?

b. Can these findings be applied to other groups or settings?

c. Does the degree of control in the study allow for statistical significance?

d. How many alternative explanations can be proposed?

Section V. Common Statistical Tests

• Independent T-Test• One-Way ANOVA• Chi-Square• Correlation• Regression

Independent T-Test• To compare means between two groups• The continuous variable is measured once.For example:Research QuestionIs there a difference in self-efficacy for pain management in week 10 between participants with Fibromyalgia (FM) in guided imagery group and those in standard care group? HypothesesHo: µGI - µSC = 0 α = 0.05Ha: µGI - µSC ≠ 0

Independent T-Test (Cont’d)Tests of assumptions with the sample• Independent groups (no overlap).• Dependent variable is continuous (interval or ratio

level).• Normal distribution.• Homogeneity of Variance is met.

Group Statistics

Group N Mean Std. Deviation

Std. Error Mean

Self efficacy for pain management in week 10

Guided Imagery (GI) 24 64.5833 22.69249 4.63209

Standard Care (SC 24 49.8333 20.30992 4.14574

Independent T-Test (Cont’d)Ho: µGI - µSC = 0 α = 0.05 t = 2.373Ha: µGI - µSC ≠ 0 p = 0.011 = 1.1%

Conclusion:There is a difference in self-efficacy in week 10 between participants with Fibromyalgia (FM) in guided imagery group and those in standard care group. In our sample, in week 10, participants in guided imagery group had greater self-efficacy than those in standard care group.

One-Way Analysis of Variance (ANOVA)• Tests for differences between means.• More flexible than other analyses in that it can examine data from

two or more groups.For example:Research QuestionIs there a difference in depression scores depending on types of elderly housing and care (independent living, assisted living, and nursing care)?

HypothesesHo = µIL = µAL = µNC α = 0.05Ha = At least 2 groups differ

ANOVA (cont’d)

Variables

Independent Living

Assisted Living

Nursing Care p

(n=16) (n=19) (n=17)

Depression scores, Mean (SD)

12.25 (7.594)

12.84 (7.274)

16.44 (8.043)

0.234(> 0.05)

Tests of assumptions— Independent groups - Continuous dependent variable— Normal distribution - Homogeneity of Variance is met

Conclusion:There is no difference in depression scores depending on types of elderly housing and care (independent living, assisted living, and nursing care).

If significant, Post Hoc tests are used to determine the location of differences.

Chi-Square Test of Independence

• Used with nominal or ordinal data• Hypothesis:

– Ho: There is no difference in Y depending on X– Ha: There is a difference in Y depending on X

• Assumptions:– Frequency data– Adequate n: > 5 expected per cell and can be

violated up to 20% of cells.

Research QuestionIs there a difference in depression at week 12 depending on the helplessness category - low or high?Hypotheses• Ho: There is no difference in depression at week 12

depending on the helplessness category - low or high.

• Ha: There is a difference in depression at week 12 depending on the helplessness category - low or high

Example of Chi-Square Test

Crosstabulation

AHITotalLow High

Depression (cat.) at week 12

Not Depressed Count 26 14 40Expected Count 22.3 17.7 40.0

% within AHI 89.7% 60.9% 76.9%

Depressed Count 3 9 12Expected Count 6.7 5.3 12.0

% within AHI 10.3% 39.1% 23.1%

Total Count 29 23 52Expected Count 29.0 23.0 52.0

% within AHI 100.0% 100.0% 100.0%

2 = 5.99, df = 1, p = 0.07 or 7% -Arthritis Helplessness Index (AHI)

Conclusion:There is a difference in depression at week 12 depending on the helplessness category - low or high. Those people in the high helplessness group had higher level of depression compared to those in the low helplessness group.

Pearson Product-Moment Correlation

• Tests for the presence of a relationship between two variables– Called bivariate correlation

• Types of correlation are available for all levels of data. Best results are obtained using interval data.

• Results– Nature of the relationship (positive or negative)– Magnitude of the relationship (–1 to +1)– Strength of r: High= > 0.70; Moderate= 0.30-0.69; Low= < 0.30– Testing the significance of a correlation coefficient– The R2 is the variation between two variables expressed as a

percentage.

Maximum positive correlation

(r = 1.0)

Maximum negative correlation

(r = -1.0)

Scatterplots and Correlation Coefficients

Strong correlation & outlier(r = 0.71)

Correlation ResultsQUESTION

Which one is significant if level of significance used in this test is 0.01?

A. r = 0.56 (p = 0.03)B. r = –0.13 (p = 0.2)C. r = 0.65 (p = 0.002)D. r = 0.33 (p = 0.04)

Regression Analysis• Used when one wishes to predict the value of one

variable based on the value of one or more other variables

• For example:– one might wish to predict the possibility of passing the

credentialing exam based on grade point average (GPA) from a graduate program.

– Or to predict the length of stay in a neonatal unit based on the combined effect of multiple variables such as gestational age, birth weight, number of complications, and sucking strength.

Regression Analysis (cont’d)• Assumptions:

– Must have Independent Variable & Dependent Variable– Both variables must be continuous– Normally distributed data– Linear relationship (scatter plot)

• The outcome of analysis is the regression coefficient R.• When R is squared, it indicates the amount of variance

in the data that is explained by the equation.• The R2 is also called the coefficient of multiple

determination.

Regression Results

• R2 = 0.63• This result indicates that 63% of the variance

in length of stay can be predicted by the combined effect of age, weight, complications, and sucking strength.

Overlay of Scatterplot and Best-Fit Line

Conclusion

• Statistical tests selection depends on the research question.

• Some research questions can be answered by using basic statistical tests; while others require advanced statistical tests.

section i. statistics

Documents

nominal7permissible

ratio scale of measurement

ratio data

satisfiedno numerical

fahrenheit scale

chisquaredlowest form

nursing contribution

clear example