medical statistics basic concept and applications [square one]

135
Medical Statistics 2013 Dr Tarek Tawfik Amin

Upload: tarek-tawfik-amin

Post on 07-May-2015

631 views

Category:

Health & Medicine


2 download

DESCRIPTION

Provide the basic concept and application of bio-statistics using a practical model coupled with the essential theoretical background.

TRANSCRIPT

Page 1: Medical statistics Basic concept and applications [Square one]

Medical Statistics2013

Dr Tarek Tawfik Amin

Page 2: Medical statistics Basic concept and applications [Square one]

Introduction

- Questions - Why statistics?- The process- The resources

Page 3: Medical statistics Basic concept and applications [Square one]

How?

• Book: Statistics at Square One 11th ed. “Campbell and Swinscow”

• SPSS Practical sessions-PASW guide.• Practical sessions using SPSS v. 17.0

Page 4: Medical statistics Basic concept and applications [Square one]

Statistics “an overview”

Data

Population

Sample

AnalysisInterpretation

Information

Parameters

Statistics

Reference range

Researches

Statistical analysis

Page 5: Medical statistics Basic concept and applications [Square one]

Statistical analysis

Statistical analysis Variables

Data

QualitativeCategorical

QuantitativeNumerical

Nominal Ordinal

Interval/Ratio

DiscreteContinuous

Descriptive Inferential

Depends on the sample (s) and objectives of analysis

Tables Graphs Measures

Page 6: Medical statistics Basic concept and applications [Square one]

I-Descriptive Statistics

Goals

SummarizingOverview

Data checking

Page 7: Medical statistics Basic concept and applications [Square one]

PATNRAGESEXSMOK

EHEIGH

TWEIGHTSBP

1SBP

2INSULINCHOLHBA1CDIABD

UDEAD

157001779814015406.307.625#NULL!

274101726915014515.108.30110

338101557012012606.5011.002#NULL!

473101657218015705.807.00210

5531217410914011916.8010.6070

674101718315114506.257.6270

781021756014011306.506.4060

886101645914015805.205.3040

978011718315114805.605.9011

1078101718315115915.008.00231

1191001718315114004.309.7041

1277021768717019806.406.6072

1377101718315115205.204.90261

1484001716216014807.007.8081

1572101546314514806.207.8001

diabIB

Page 8: Medical statistics Basic concept and applications [Square one]

I-Tables

Frequency Contingency

SEX

145 52.2 52.2 52.2

133 47.8 47.8 100.0

278 100.0 100.0

male

female

Total

ValidFrequency Percent Valid Percent

CumulativePercent

smoking history * SEX Crosstabulation

Count

26 110 136

64 14 78

55 9 64

145 133 278

never

stopped smoking

yes

smokinghistory

Total

male female

SEX

Total

Tables can summarize counts, frequency (categorical), measures (numerical)

For comparison (2 or more variables)

Page 9: Medical statistics Basic concept and applications [Square one]

Food items (servings/day) *Subjects classificationP value a

Obese (N=91)Non-obese (N=125)

Milk Milk beverage Milk in cereals Milk in coffee or tea - Total milk Yoghurt Cheese Ice cream- Total dairy Tuna (canned) Fish Half cooked fish Shrimp/oyster Eggs Liver (including chicken livers) Others! -Dietary vitamin D (IU/day): Median

(mean ±SD) Low dietary intake c (< 200 IU/day): No.

(%)-Dietary calcium (mg/day): Median

(mean ±SD) Low calcium intake d (<1000mg/day): No.

(%)

0.52(0.71±0.3)0.45(0.59±0.4)0.20(0.33±0.2)0.15(0.25±0.6)

0.90(1.03±0.3)0.10(0.12±0.6)0.20(0.24±0.9)0.15(0.14±0.6)

0.25(0.45±0.6)0.05(0.03±0.1)0.15(0.19±0.7)0.06(0.11±0.5)0.05(0.08±0.1)0.85(0.81±1.1)0.02(0.04±0.4)0.20(0.23±0.3)

(111.6)118.1±73.5

56(62.2) (660.0)698.8±26

1.951(56.7)

0.65(0.88±0.7)0.35(0.53±0.4)0.50(0.58±0.4)0.20(0.23±0.6)

1.20(1.34±0.7)0.20(0.14±0.5)0.20(0.29±0.8)0.06(0.09±0.3)

0.30(0.43±0.7)0.03(0.04±0.3)0.10(0.18±0.5)0.25(0.27±0.6)0.05(0.06±0.1)0.80(0.76±0.7)0.05(0.06±0.3)0.40(0.55±0.5)

(123.7)132.2±67.447(37.6)

(692.0)717.9±245.949(39.2)

0.0310.2790.0010.7900.0010.7900.6610.4220.8260.7610.9020.0290.1490.7970.8340.5490.0340.003b

0.2230.011b

Table 3 Daily servings of calcium and vitamin D rich foods in relation to body mass index classification of the included adults .

Page 10: Medical statistics Basic concept and applications [Square one]

Assignment I Table 1 Basic characteristics for the patients examined (N=278).

Baseline characteristics 1996Total (N=278)

1 -Men)%( 2 -Insulin users)%(

3 -Smokers)%( 4 -Ex-smokers)%(

5 -Non-smokers)%( 6 -Age in years (mean ±SD)

7 -Systolic Blood pressure at starting point mmHg (mean ±SD)

8 -Systolic blood pressure two years mm Hg (mean ±SD)9 -Duration of diabetes (median/Quartiles 1-3)

10 -Missed values

52.2

25.5

23.0

28.1

48.9

67.24 ±11.74

151.20 ±22.00

153.83 ±29.1

6.0( 2.75-12.25)

0.0

Page 11: Medical statistics Basic concept and applications [Square one]

II-Graphs

GoalsImpressionComparison

Data checking Clustering

Trend

Page 12: Medical statistics Basic concept and applications [Square one]

II- Graphs

Figure 1Outcomes of the included diabetic patients (1996)

other cause of death

died from CVD

alive

Missing

Selection of graphs 1-Types of variables

2-Number of variables 3-Objectives

Categorical Numerical

Figure 2: Smoking status of the inlcuded diabetic patients

smoking history

yesstopped smokingnever

Per

cent

60

50

40

30

20

10

0

Next

Page 13: Medical statistics Basic concept and applications [Square one]

total cholesterol

Figure 3: Total cholesterol level in diabetic pateints 1996

in mmol/l60

50

40

30

20

10

0

Std. Dev = 1.33

Mean = 6.25

N = 278.00

For numerical variables

Page 14: Medical statistics Basic concept and applications [Square one]

133145N =

Figure 4: Systolic blood pressure at starting point

among diabetic patients 1996 (mmHg)

SEX

femalemale

syst

. blo

od p

ress

ure

at st

art

240

220

200

180

160

140

120

100

80

24728

676899

Page 15: Medical statistics Basic concept and applications [Square one]

955 1464 11026N =

Figure 6: Total cholesterol level in relation to gender and

smoking status among diabetic patients 1996

SEX

femalemale

95%

CI to

tal c

hol

este

rol (

mm

ol/l)

8.5

8.0

7.5

7.0

6.5

6.0

5.5

5.0

smoking history

never

stopped smoking

yes

Page 16: Medical statistics Basic concept and applications [Square one]

duration of diabetes

32.5

30.0

27.5

25.0

22.5

20.0

17.5

15.0

12.5

10.0

7.5

5.0

2.5

0.0

Figure 7: Duration of diabetes among the included patients 1996

(in years)80

70

60

50

40

30

20

10

0

Std. Dev = 6.96

Mean = 7.9

N = 278.00

Median=6.0

Mode

Median

Mean

Normal distribution

+-

Outliers

Checking for normality

Mode=1

Page 17: Medical statistics Basic concept and applications [Square one]

III-Measures (numerical variables)

MeanMedianModePercentiles

Central Tendency Dispersion

Range (max-min)Inter Quartile rangeVariance Standard deviationVariation coefficient

How the data aggregate around a central pointHow the data varies

Page 18: Medical statistics Basic concept and applications [Square one]

Central Tendency Mean= summation of observations/their numberAffected by extremes of value

)x1+x2+x3/(number

Mode= The most frequently occurring values in a set of observations

Median= The middle value that divide the ordered data set into 50/50Not affected by extremes of values

Page 19: Medical statistics Basic concept and applications [Square one]

3 7 37

Age of sample

Median=7Mean=(3+7+37)/3=15.7

1173Median=7Mean=(3+7+11)/3=7

Page 20: Medical statistics Basic concept and applications [Square one]

Dispersion

1 6 8 10 16 17 23 43 531

Range=53-1=52Affected by extremes of values

Median=1350% of data

50th percentile=13

75% of the data75th percentiles

3rd quartile

25% of data25th percentile

1st quartile

Interquartile range=3rd-1st quartiles23-6=17

IQR not affected by extremes of values

Page 21: Medical statistics Basic concept and applications [Square one]

Standard deviation and variance

3 7 17

Sample of 3, their age in years

Mean age=(3+7+17)/3=9

9

+8-2-6

The sum of the differences between the mean and individual values=0The mean deviation=0

To overcome the 0= sum the difference squared/number-1= Variance)3-9(2)+6-9(2)+17-9(2/3-1=52

The amount of dispersion around the mean=52 years2 (wrong scale)

Hence we need to convert back to the usual (natural) scale, use the standard deviation√Variance=±7.2 years

Page 22: Medical statistics Basic concept and applications [Square one]

The sample disperses around the mean (=9 years) by 7.2 years on both directions

Page 23: Medical statistics Basic concept and applications [Square one]

Description of a binary (dichotomous variable)

o A binary variable: Has only two outcomes (diseased or not diseased).

o The proportion of the population that is diseased (at certain point of time) is called prevalence.

o The new cases occurring is called incidence.

Page 24: Medical statistics Basic concept and applications [Square one]

Dichotomous variables

Prevalence= All cases (new or old)/at risk population

Incidence= New cases/total population at risk

Page 25: Medical statistics Basic concept and applications [Square one]

Probability and Oddso Odds= chance o In a population of 1000, 200 has a certain

disease. o When we randomly take one person out, the

probability that this person is diseased= 200/1000= 0.2 (this is probability)

o The chance (the Odds) that is person is diseased= probability of having the disease /probability of not having the disease.

o Odds= P (probability of disease)/probability of not having the disease (1-P)=P/1-P= 0.2/0.8=1/4, the odds are 1 to 4.

Page 26: Medical statistics Basic concept and applications [Square one]

The following table depicts the outcomes of isoniazid/placebo trail among children with HIV (death within 6 months) .

Interventions

Dead (within 6 months)

Alive Total

Placebo21110131

Isoniazid 11121132

What is the risk of dying?

Risk=21/131=0.160

Risk=11/132=0.083

Absolute risk difference (ARD)=risk in placebo-risk in isoniazid= 0.077

Net relative risk (NRR)=risk in placebo/risk in isoniazid= 1.928

Relative risk reduction (RRR)=risk in placebo-risk in isoniazid/risk in placebo= 0.48

Number needed to treat (NNT)=1/ARD=1/0.077=13

Page 27: Medical statistics Basic concept and applications [Square one]

Odds ratio (OR)

o An odds ratio (OR) is a measure of association between an exposure and an outcome.

o The OR represents the odds that an outcome will occur given a particular exposure, compared to the odds of the outcome occurring in the absence of that exposure.

o Odds ratios are most commonly used in case-control studies, however they can also be used in cross-sectional and cohort study designs as well (with some modifications and/or assumptions).

Page 28: Medical statistics Basic concept and applications [Square one]

Disease-free

Dis

eased

Population

Diseased (cases)

Disease-free(controls)

Exposed to factor(a)

Unexposed to factor(b)

Unexposed to factor(d)

Exposed to factor(c)

Sam

ple

Trace Present time

Starting pointPast time

Basic structure of case-control design

The O

dds “ch

ance

of e

xposu

re

Is calcu

late

d b

etw

een b

oth

gro

ups

Page 29: Medical statistics Basic concept and applications [Square one]

Calculation

Case control study

Diseased None Total

Exposed Cases+ exposed (a)

Exposed+ not diseased (b)

a+b

Non-exposed Cases-not exposed (c)

Not exposed+ not diseased

(d)

c+d

Odds ratio= a/c÷b/d= ad/bc

Prevalence among the diseased/prevalence among the non-diseased

OR=1 Exposure does not affect odds of outcomeOR>1 Exposure associated with higher odds of outcomeOR<1 Exposure associated with lower odds of outcome

Page 30: Medical statistics Basic concept and applications [Square one]

Odds ratio

Case control study

Lung cancer No lung cancer

Total

Smoking a-80b-30110

Nonec-20d-7090

80x70=560030x20=600

5600/600=9.3

Or 80/20÷30/70=9.3

Page 31: Medical statistics Basic concept and applications [Square one]

Basic Structure of cohort study

Disease-free

Dis

eased

Disease-free

Unexposedto factor

Exposed to factor

Population

Develop Disease (a)

Disease-free)b(

Develop Disease (c)

Disease-free)d(

Sam

ple

Starting point

Present time Future timeFollow

Com

parin

g th

e in

cid

en

ce o

f dis

ease in

each

g

rou

p

The Relative Risk is calculated for exposure

Page 32: Medical statistics Basic concept and applications [Square one]

Relative risk (RR)

Mammography

Breast cancer No breast cancer

Total

Positive a-10b-90100

Negative c-20d-998980100,100

In Cohort design

RR= a/(a+b)÷c/(c+d)10)/100÷ (20)100,100=(0.1/0.0002 =500

Page 33: Medical statistics Basic concept and applications [Square one]

The relative risk (RR)

Lung cancer

No lung cancer

Total

Smokers 18582600

Non 611941200

Cohort

stu

dy

Risk for smokers=18/600=0.03Risk for non-smokers=6/1200=0.005RR=0.03/0.005=6

Page 34: Medical statistics Basic concept and applications [Square one]

The Odds ratio (OR)

Lung cancer

No lung cancer

Total

Smokers 8030110

Non 207090

Case

contr

ol st

udy

Odds for smokers=80/30=2.67Odds for non-smokers=20/70=0.29OR=80*70/30*20=9.33

Page 35: Medical statistics Basic concept and applications [Square one]

Assignment I Table 1 Basic characteristics for the patients examined (N=278).

Baseline characteristics 1996Total (N=278)

1 -Men)%( 2 -Insulin users)%(

3 -Smokers)%( 4 -Ex-smokers)%(

5 -Non-smokers)%( 6 -Age in years (mean ±SD)

7 -Systolic Blood pressure at starting point mmHg (mean ±SD)

8 -Systolic blood pressure two years mm Hg (mean ±SD)9 -Duration of diabetes (median/Quartiles 1st -3rd)

10 -Missed values

52.2

25.5

23.0

28.1

48.9

67.24 ±11.74

151.20 ±22.00

153.83 ±29.1

6.0( 2.75-12.25)

--

Page 36: Medical statistics Basic concept and applications [Square one]

Smoking histroy (all subjects)

smoking history

yesstopped smokingnever

Pe

rce

nt

60

50

40

30

20

10

0

23

28

49

2a

Page 37: Medical statistics Basic concept and applications [Square one]

Smoking history by sex

smoking history

yesstopped smokingnever

Pe

rce

nt

100

80

60

40

20

0

SEX

male

female711

83

38

44

18

2b

Page 38: Medical statistics Basic concept and applications [Square one]

Age using Bar (mean used as summary)

SEX

femalemale

Mea

n ag

e (y

ears

)

70

69

68

67

66

65

64

3a

Page 39: Medical statistics Basic concept and applications [Square one]

133145N =

Boxplot age by Sex

SEX

femalemale

age

(yea

rs)

120

100

80

60

40

20

0

195

3b

This graph gives check for Data distribution and checking for outliers

Page 40: Medical statistics Basic concept and applications [Square one]

height (cm)

Height of the included subjects 50

40

30

20

10

0

Std. Dev = 8.89

Mean = 170.5

N = 278.00

Median=170.55 cm

4a

Page 41: Medical statistics Basic concept and applications [Square one]

duration of diabetes

32.5

30.0

27.5

25.0

22.5

20.0

17.5

15.0

12.5

10.0

7.5

5.0

2.5

0.0

Duration of diabetes 80

70

60

50

40

30

20

10

0

Std. Dev = 6.96

Mean = 7.9

N = 278.00

4b

Median=6.0 years

Page 42: Medical statistics Basic concept and applications [Square one]

syst. blood pressure at start

1 .4 .4 .4

1 .4 .4 .7

2 .7 .7 1.4

1 .4 .4 1.8

2 .7 .7 2.5

21 7.6 7.6 10.1

2 .7 .7 10.8

1 .4 .4 11.2

1 .4 .4 11.5

6 2.2 2.2 13.7

1 .4 .4 14.0

16 5.8 5.8 19.8

1 .4 .4 20.1

2 .7 .7 20.9

1 .4 .4 21.2

11 4.0 4.0 25.2

1 .4 .4 25.5

2 .7 .7 26.3

1 .4 .4 26.6

28 10.1 10.1 36.7

2 .7 .7 37.4

4 1.4 1.4 38.8

12 4.3 4.3 43.2

1 .4 .4 43.5

1 .4 .4 43.9

31 11.2 11.2 55.0

1 .4 .4 55.4

23 8.3 8.3 63.7

1 .4 .4 64.0

1 .4 .4 64.4

2 .7 .7 65.1

1 .4 .4 65.5

21 7.6 7.6 73.0

1 .4 .4 73.4

1 .4 .4 73.7

1 .4 .4 74.1

1 .4 .4 74.5

5 1.8 1.8 76.3

1 .4 .4 76.6

2 .7 .7 77.3

14 5.0 5.0 82.4

1 .4 .4 82.7

2 .7 .7 83.5

4 1.4 1.4 84.9

1 .4 .4 85.3

1 .4 .4 85.6

1 .4 .4 86.0

2 .7 .7 86.7

14 5.0 5.0 91.7

2 .7 .7 92.4

1 .4 .4 92.8

1 .4 .4 93.2

1 .4 .4 93.5

1 .4 .4 93.9

6 2.2 2.2 96.0

1 .4 .4 96.4

1 .4 .4 96.8

2 .7 .7 97.5

1 .4 .4 97.8

1 .4 .4 98.2

3 1.1 1.1 99.3

1 .4 .4 99.6

1 .4 .4 100.0

278 100.0 100.0

100

110

112

115

116

120

121

122

124

125

127

130

131

132

134

135

136

137

139

140

141

144

145

147

148

150

151

151

152

153

155

158

160

161

162

163

164

165

167

168

170

171

172

175

176

177

178

179

180

182

184

185

187

189

190

194

195

200

205

209

210

216

220

Total

ValidFrequency Percent Valid Percent

CumulativePercent

Using Frequency table: P95≈189-190

5a-

Page 43: Medical statistics Basic concept and applications [Square one]

p95, p5= Mean± Z score (probability) at the specified percentiles *(Standard deviation)

P95 SBP1= 151.2+1.645(22.0)=187.4 mmHg

Probability distribution of the normal curve: page 180

52-/-

Page 44: Medical statistics Basic concept and applications [Square one]

duration of diabetes

12 4.3 4.3 4.3

35 12.6 12.6 16.9

22 7.9 7.9 24.8

21 7.6 7.6 32.4

24 8.6 8.6 41.0

20 7.2 7.2 48.2

23 8.3 8.3 56.5

19 6.8 6.8 63.3

6 2.2 2.2 65.5

6 2.2 2.2 67.6

6 2.2 2.2 69.8

13 4.7 4.7 74.5

2 .7 .7 75.2

7 2.5 2.5 77.7

6 2.2 2.2 79.9

5 1.8 1.8 81.7

11 4.0 4.0 85.6

8 2.9 2.9 88.5

6 2.2 2.2 90.6

5 1.8 1.8 92.4

3 1.1 1.1 93.5

5 1.8 1.8 95.3

2 .7 .7 96.0

2 .7 .7 96.8

3 1.1 1.1 97.8

1 .4 .4 98.2

1 .4 .4 98.6

2 .7 .7 99.3

1 .4 .4 99.6

1 .4 .4 100.0

278 100.0 100.0

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

25

26

27

28

31

32

Total

ValidFrequency Percent Valid Percent

CumulativePercent

P5 for duration of diabetes

5b-1

Page 45: Medical statistics Basic concept and applications [Square one]

Or using the formula:Mean-Z score (1.645)* SD =-3.6 years

Page 46: Medical statistics Basic concept and applications [Square one]

Total population n=287, μ=67.24 years

σ11.743

+-

Page 47: Medical statistics Basic concept and applications [Square one]

Sample no.MeanSD

167.612.07

267.1311.81

36711.98

467.811.63

566.3311.44

667.4411.95

767.8412.42

866.5911.36

96712

1066.3811.9

1168.0612.06

1267.6111.02

1367.3111.33

1466.4411.91

1566.8711.26

1666.811.5

1766.7312.37

1866.3811.77

1967.0311.22

2066.5812.13

2166.8111.55

2266.5812.21

2367.211.61

2466.4811.48

2567.5312.1

2667.5810.6

276711.91

2867.3111.59

Mean of the means67.0311.3

28 samples of 150 from a total population of 287

0

20

40

60

801

2 34

56

7

8

9

1011

121314

151617

1819

20

21

22

23

2425

2627 28

Sample no.

Mean

SD

Age in years

Page 48: Medical statistics Basic concept and applications [Square one]

Population and Sample

o In scientific research we want to make a statement (conclusion) about the population.

o Studying the whole population is impossible in terms of money/time/labor.

o Random sampling from the population and infer from the sample data the needed conclusions.

o The task of statistics is to quantify the uncertainty (the sample is really representing that population).

Page 49: Medical statistics Basic concept and applications [Square one]

The concept of sampling

Study population:Sampling units

You select a few sampling unitsfrom the study population

Sample

You collect informationfrom these people to find answers to your research questions.

You make an estimate “prediction” extrapolated to the study population

(prevalence, outcomes etc.)

Page 50: Medical statistics Basic concept and applications [Square one]

What would be the mean systolic blood pressure of older subjects (65+) in Al

Hassa?

Pop

ula

tion

mean

(μ)=

un

kn

ow

n

175

165

180

155

From our sample we calculate an estimate of the population parameter

Page 51: Medical statistics Basic concept and applications [Square one]

The good sample (the estimator)

Should be :

Unbiased:

The mean of sample = population mean

Precise: (narrow dispersion about the mean)

The dispersion in repeated samples is small

This is a dream

Page 52: Medical statistics Basic concept and applications [Square one]

Sampling error

Four individuals A, B, C, DA = 18 yearsB= 20 yearsC= 23 yearsD= 25 yearsTheir mean age is = 18+20+23+

25= 86/4= 21.5 years (population mean μ).

Page 53: Medical statistics Basic concept and applications [Square one]

Probability of sampling two individuals: (6 probabilities)

A+B=18+20= 38/2=19.0 yearsA+C= 18+23=20.5 years.A+D=18+25=21.5 years.B+C=20+23=21.5 years.B+D=20+25=22.5 years.C+D=23+25=24.0 years.

Probability of sampling three individuals: (4 probabilities)

A+B+C=18+20+23=20.33 years.A+B+D=18+20+25=21.00 years.A+C+D=18+23+25=22.00 years.B+C+D=20+23+25=22.67 years.

If C=32 (instead of 23) years and D=40 (instead of 25) years: sampling of 2= sampling error of -7.00 to +7.00 and in 3= -3.67 to +3.67 years.

Sampling error= population mean-sample mean= ranges from -2.5 to +2.5 years.

Error = ranges from -1.17 to +1.7 years.

The greater the variability of a given variable the larger the sampling error for a given sample size.

Page 54: Medical statistics Basic concept and applications [Square one]

Infinite samples should represents the population it came from (good estimator)

Page 55: Medical statistics Basic concept and applications [Square one]

2

o The normal distribution o The Standard error of the meano Estimation:

- Reference interval - Confidence intervals

For mean proportion

Difference between means/proportions

RR and OR

Page 56: Medical statistics Basic concept and applications [Square one]

/ /١٤٤٤ ٠٩ ٢١56

Normal Distribution: Many human traits, such as intelligence, personality, and

attitudes, also, the weight and height, are distributed

among the populations in a fairly normal way.

Page 57: Medical statistics Basic concept and applications [Square one]
Page 58: Medical statistics Basic concept and applications [Square one]

The normal distribution

±68% within between μ ±1 SD (σ)

±95% within between μ ±2 SD (σ)

>2SDs Possible outliers

>3 SDs Definite outliers

Page 59: Medical statistics Basic concept and applications [Square one]

One more The Z score which measures how many standard deviations a particular data point is above or below the mean. oUnusual observations would have a Z score over 2 or under 2 SD.oExtreme observations would have Z scores over 3 or under 3 SD and should be investigated as potential outliers.

sXZ 1

Page 60: Medical statistics Basic concept and applications [Square one]

Areas under the standard normal curve.

ZArea under curve between both points (around the mean)

Beyond both points

)two tails(

Beyond one point

)one tail(

±0.1

±0.2

±0.3

±0.4

±0.5

±0.6

±0.7

±0.8

±0.9

±1

±1.1

±1.2

±1.3

±1.4

±1.5

±1.6

±1.645

±1.7

±1.8

±1.9

1.96

±2

±2.1

±2.2

±2.3

±2.4

±2.578

0.080

0.159

0.236

0.311

0.383

0.451

0.516

0.576

0.632

0.683

0.729

0.770

0.806

0.838

0.866

0.890

0.900

0.911

0.928

0.943

0.950

0.954

0.964

0.972

0.979

0.984

0.99

0.920

0.841

0.764

0.689

0.617

0.549

0.484

0.424

0.368

0.317

0.271

0.230

0.194

0.162

0.134

0.110

0.100

0.089

0.072

0.057

0.050

0.046

0.036

0.028

0.021

0.010

0.004

0.4600

0.4205

0.3820

0.3445

0.3085

0.2745

0.2420

0.2120

0.1840

0.1585

0.1355

0.1150

0.0970

0.0810

0.0670

0.0550

0.0500

0.0445

0.0360

0.0290

0.0250

0.0230

0.0180

0.0140

0.0105

0.0100

0.0020

Page 61: Medical statistics Basic concept and applications [Square one]

Calculating values from Z-scores

Xi = Mean± Z (standard deviation).

Value (percentiles) =Mean± Z score*(SD)

Page 62: Medical statistics Basic concept and applications [Square one]

Random sample for estimating a population mean

μ?

X1=128

X2=133

X3=129

From the information in the sample, we will estimate the unknown population mean (X is an estimator for μ) What could have happened if we had another random sample?

What is the measure of variation of sample means?

Page 63: Medical statistics Basic concept and applications [Square one]

The Sampling Distribution of a Sample Statistics

≈ Let’s assume that we want to survey a community of 400, the age of them were recorded and having the following parameters:

µ = 35 years σ = 13 years

≈ Let’s assume, however, that we do not survey all 400, instead we randomly select 120 people and ask them about their ages and calculate the mean age.

≈ Then, we put them back into the community and randomly select another 120 residents (may include members of the first sample).

≈ We did this over and over and each time we calculate the mean age.

≈ The results will be like those in the following table.

Page 64: Medical statistics Basic concept and applications [Square one]

Distribution of 20 random sample means (n=20)

Sample NumberSample mean

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

SD of the means

34.7

35.9

35.5

34.7

34.5

34.4

35.7

34.6

37.4

35.3

34.1

35.5

34.9

36.2

35.6

35.0

35.1

36.4

35.6

33.6

13.37

353433 36 37

. . ..…..…

.… .. ..

All the results are clustered around the population value (35 years), with a few scores a bit further out and one extreme score of 37.4 years (random variation=1/20=5%).

Those 400 people have age range from 2 to 69 years ,while the means of the samples have a very narrow range of value of about 4 years and 10 samples coincide with the population mean (35 years).

μ

Page 65: Medical statistics Basic concept and applications [Square one]

Most of the samples will cluster around the population parameters with occasional sample result falling relatively further to one side or the other of the distribution (this called the sampling distribution of sample means). Has the following properties:The mean of the sampling distribution is equal to the population mean, the average of the averages (µχ) will be the same as the population mean. The standard deviation of the sample means = the standard error SE= σ/√n, (σ= population SD). The distribution of the sample means is Normal if the population distribution is Normal.If the population distribution is Not Normal, The distribution of the sample means is almost Normal when n is large (Central Limit Theorem).

Page 66: Medical statistics Basic concept and applications [Square one]

PopulationParameters

Mean S.D

Sample

Mean S.D

Standard error of the mean

The degree the sample statistics are deviating /different from the population parameters.

The term error indicates the fact that due to sampling error, each sample mean is likely to deviate some what from true population mean.

Sample mean

Page 67: Medical statistics Basic concept and applications [Square one]
Page 68: Medical statistics Basic concept and applications [Square one]

Central Limit Theorem

The formula for SE= SD/√n.The formula indicates that we are estimating the SE given the S.D of a sample of size n.For a sample of 100 and S.D of 40 the SE= 40 /√100 = 4.For a sample of 1000 and S.D of 40 the SE= 40 / √1000 = 1.26.

Two factors influence the SE, sample size and S.D of the sample:

Sample size has greater impact as it is used a denominator .

For a sample of 100 and S.D of 20 the SE = 20 / √100 = 2.For a sample of 100 and S.D of 40 the SE = 40 / √100 = 4.If there is more variability within a sample the greater

the SE.

Page 69: Medical statistics Basic concept and applications [Square one]
Page 70: Medical statistics Basic concept and applications [Square one]

Confidence Interval (CI)

A confidence interval gives an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data.

Page 71: Medical statistics Basic concept and applications [Square one]

We need to know the smallest and the largest μ (range) we think is likely using sample statistics. The mean of sample = μ

Page 72: Medical statistics Basic concept and applications [Square one]
Page 73: Medical statistics Basic concept and applications [Square one]

c= level of

confidence

Z c= Z critical

values (under

normal curve)

90%

95%

99%

1.645

1.960

2.578

n

c

C.I= Mean of the sample ±Z critical scores (SEM)SEM= SD/√n

Page 74: Medical statistics Basic concept and applications [Square one]

C.I

• The confidence interval provides a range that is highly likely (often 95% or 99%) to contain the true population parameter that is being estimated.

• The narrower the interval the more informative is the result.

• It is usually calculated using the estimate (sample mean) and its standard error (SEM).

Page 75: Medical statistics Basic concept and applications [Square one]

CI for μSystolic blood pressure in 287 diabetic patients

Descriptives

151.20 1.319

149.02

153.38

150.30

150.00

483.880

21.997

100

220

120

30.00

.540 .146

.152 .291

Mean

Lower Bound

Upper Bound

90% ConfidenceInterval for Mean

5% Trimmed Mean

Median

Variance

Std. Deviation

Minimum

Maximum

Range

Interquartile Range

Skewness

Kurtosis

syst. bloodpressure at start

Statistic Std. Error

90% C.I= 151.20±1.65(21.997/√287)C.I=149.02-153.38 mmHg

Descriptives

155.06 3.064

149.92

160.20

154.72

151.20

460.033

21.448

115

205

90

30.00

.263 .340

-.506 .668

Mean

Lower Bound

Upper Bound

90% ConfidenceInterval for Mean

5% Trimmed Mean

Median

Variance

Std. Deviation

Minimum

Maximum

Range

Interquartile Range

Skewness

Kurtosis

syst. bloodpressure at start

Statistic Std. Error

Random sample of 50 out of 287

Page 76: Medical statistics Basic concept and applications [Square one]

Descriptives

151.20 1.319

148.60

153.80

150.30

150.00

483.880

21.997

100

220

120

30.00

.540 .146

.152 .291

Mean

Lower Bound

Upper Bound

95% ConfidenceInterval for Mean

5% Trimmed Mean

Median

Variance

Std. Deviation

Minimum

Maximum

Range

Interquartile Range

Skewness

Kurtosis

syst. bloodpressure at start

Statistic Std. Error

95% C.I=151.20±1.96(21.997/√287)C.I=148.60-153.80 mmHg

Descriptives

155.06 3.064

148.90

161.22

154.72

151.20

460.033

21.448

115

205

90

30.00

.263 .340

-.506 .668

Mean

Lower Bound

Upper Bound

95% ConfidenceInterval for Mean

5% Trimmed Mean

Median

Variance

Std. Deviation

Minimum

Maximum

Range

Interquartile Range

Skewness

Kurtosis

syst. bloodpressure at start

Statistic Std. Error

Random Sample of 50 out of 287

Page 77: Medical statistics Basic concept and applications [Square one]

Descriptives

151.20 1.319

147.78

154.62

150.30

150.00

483.880

21.997

100

220

120

30.00

.540 .146

.152 .291

Mean

Lower Bound

Upper Bound

99% ConfidenceInterval for Mean

5% Trimmed Mean

Median

Variance

Std. Deviation

Minimum

Maximum

Range

Interquartile Range

Skewness

Kurtosis

syst. bloodpressure at start

Statistic Std. Error

99% C.I=151.20±2.58(21.997/√287)C.I=147.78-154.62 mmHg

Descriptives

155.06 3.064

146.84

163.28

154.72

151.20

460.033

21.448

115

205

90

30.00

.263 .340

-.506 .668

Mean

Lower Bound

Upper Bound

99% ConfidenceInterval for Mean

5% Trimmed Mean

Median

Variance

Std. Deviation

Minimum

Maximum

Range

Interquartile Range

Skewness

Kurtosis

syst. bloodpressure at start

Statistic Std. Error

Random sample of 50 out of 287

Page 78: Medical statistics Basic concept and applications [Square one]

90% C.I= 151.20±1.65(21.997/√287)C.I=149.02-153.38 mmHg

95% C.I=151.20±1.96(21.997/√287)C.I=148.60-153.80 mmHg

99% C.I=151.20±2.58(21.997/√287)C.I=147.78-154.62 mmHgWhat does this mean? It means that if the

same population is sampled on numerous occasions and interval estimates are made on each occasion, the resulting intervals would bracket the true population parameter (ranged) in approximately 90, 95 and 99 % of the cases.

Page 79: Medical statistics Basic concept and applications [Square one]

The sample distribution of a proportion

nKpn

pppSE

p

/

(1)()

()96.1 SEpCI p Z critical score equal 95%

Page 80: Medical statistics Basic concept and applications [Square one]

Smokers among diabetics

Sample=400Smokers=40P=40/400=0.1SE (p) = √0.1-0.9/400=0.015

CI p 95%= 0.1±1.96(0.015)

[0.07-0.13] for % it is the same SE=1.5% C.I=[7-

13]

Page 81: Medical statistics Basic concept and applications [Square one]

95% CI for the difference between two means (μ1-μ2)

Smoke nMean SBPSE (mean)

No 214153.11.50

Yes 64144.82.62

Difference

8.3

22

21

2121

(())(())

()*96.1

SESESE

SE

C.I= 2.4 to 14.2

Page 82: Medical statistics Basic concept and applications [Square one]

95% CI for percentage

Smoke (n) %died SE

No (212)28.83.11

Yes (64)23.45.30

Difference= 5.4%

2

(2100)

1

(1100)

()*96.1

21

n

pP

n

pPSE

PPSEPP snssns

95% C.I=-6.7% to 17.4%

Page 83: Medical statistics Basic concept and applications [Square one]

95% CI for RR and OR

Use available software

http://www.medcalc.org/calc/relative_risk.php

http://www.medcalc.org/calc/odds_ratio.php

vl.academicdirect.org/applied_statistics/.../CIcalculator.xls

Page 84: Medical statistics Basic concept and applications [Square one]

Assignment II

Page 85: Medical statistics Basic concept and applications [Square one]

Inferential StatisticsTesting in research

o In scientific research we would like to test if our research ideas are true.

o Based on previous observations (studies) we know that the mean cholesterol of patients with diabetes is higher than those without the disease.

o We will take samples and check whether the results will agree with our expectations.

o Meaning we are going to test the situation using a statistical test.

Page 86: Medical statistics Basic concept and applications [Square one]

The Z-test for one sample

Serum cholesterol (μ=5 mmol/L)

σ=±1.5Diabetic patients, mean cholesterol > 5

Considering σ=±1.5?

Is there any difference between diabetes free population and the diabetic patients regarding serum cholesterol? Let’s perform Z test .

Page 87: Medical statistics Basic concept and applications [Square one]

Research question (hypothesis)

The research hypothesis would be

The mean cholesterol of diabetics is > 5mmol/L

Null hypothesisH0: μ=sample mean=5

Alternative hypothesisH1: μ >5 (one sided)

OrH1: μ≠5 (two sided)

Page 88: Medical statistics Basic concept and applications [Square one]

Procedure

total cholesterol

13.0012.00

11.0010.00

9.008.00

7.006.00

5.004.00

3.00

Cholesterol level diabetic patients in mmol/L60

50

40

30

20

10

0

Std. Dev = 1.33

Mean = 6.25

N = 278.00

μ=5

Mean of sample

If the sample mean close to the population meanThe null hypothesis is TRUE

If the sample mean differs from population meanWe REJECT the null

Page 89: Medical statistics Basic concept and applications [Square one]

The ά level (P value)

The probability to obtain /achieve the null hypothesis

The probability that Population mean=sample mean

There no difference between the population and sample mean.

Or

The maximum probability we accept to reject the null hypothesis falsely

ά = 0.05

Page 90: Medical statistics Basic concept and applications [Square one]

Alpha level

P ≤ 0.05 (ά)Reject the null

Sample mean≠

population mean

P > 0.05 (ά)Accept the nullSample mean=

population mean

Page 91: Medical statistics Basic concept and applications [Square one]

Calculation (σ=1.5)

SEM=μ/√n=0.3 Z=(mean sample-μ)/σ

P (mean of the sample≥6)=P(Z ≥6-5)/0.3= 0.0005Under the normal curve area of rejection >1.96 Z

P=0.0005 :The cholesterol blood level of diabetic patients can coincide

with the population (disease free) 5 in 10,000 times The two values could be the same in 5 times if we repeated this test 10,000 times

P < 0.05 so we reject the nullThe diabetics have larger mean cholesterol level than the normal population

Page 92: Medical statistics Basic concept and applications [Square one]

In reality

It is unlikely that the σ (population SD) is known.

In most of the cases, σ will be unknown and we will be able to apply neither the formula nor the table of normal distribution (areas under the curve=Z score).

We resort to other statistical tests.

Page 93: Medical statistics Basic concept and applications [Square one]

Possible situations in testing

Page 94: Medical statistics Basic concept and applications [Square one]

Possible situations in Hypothesis testing

Reject H0Do not reject H0

H0 is true Type I error (ά)OK (1-ά)

H0 is not true

OK (1-В)Type II error (В)

Realit y

Decision

Level of significance

1-В= PowerIt is the probability to reject the null hypothesis if is NOT TRUEUsually 80% is the least required for any test

Page 95: Medical statistics Basic concept and applications [Square one]

Errors of Hypothesis Testing and PowerDecisions and errors in hypothesis testing

True Situation Difference exist (H1) No

difference (H0)

Study results

Correct decision(power or 1-β)

Type I error or άRejection when it is true

False rejectionThere is a difference when it is really not

Type II or β errorFalse acceptanceThere is no difference when it is really present.

Correct decision

Con

clu

sion

fro

m h

ypot

hes

is t

esti

ng

Difference existReject H0

No differenceDo not reject H0

Page 96: Medical statistics Basic concept and applications [Square one]

Passive smoking and lung cancer

Truth about the population

Passive smoking is

related to lung cancer.

Not related to lung cancer.

Type II ErrorIncorrect acceptance Passive smoking is not related to lung cancer when it is really does.

Type I ErrorIncorrect rejectionPassive smoking is related to lung cancer when it is really not..

Conclusions, based on results from a study of a sample of the population

Reject the null hypothesis (rates in the study appear to be different)

Accept the null hypothesis (rates in the study appear similar)

Page 97: Medical statistics Basic concept and applications [Square one]

The Alpha-Fetoprotein (AFP) test has both Type I and Type II error possibilities .

This test screens the mother’s blood during pregnancy for

AFP and determines risk .Abnormally high or low levels may indicate Down

syndrome . H0: patient is healthy

Ha: patient is unhealthy

Error Type I (False positive or False Rejection) is: Test wrongly indicates that patient has a Down syndrome, which means that pregnancy must be aborted for no reason.

Error Type II (False negative or False Acceptance) is: Test is negative and the child will be born with multiple anomalies

Page 98: Medical statistics Basic concept and applications [Square one]

Hypothesis Test

This is the distribution given the null hypothesis is true

Page 99: Medical statistics Basic concept and applications [Square one]

Type I and Type II Error

False rejection

False acceptance

Page 100: Medical statistics Basic concept and applications [Square one]

One Sample

The distribution of X under the null and alternative hypotheses.

Page 101: Medical statistics Basic concept and applications [Square one]

t-distribution

In real life situations we will estimate the unknown population SD using Sample SD .

Results are standardized to the t-distribution:

ns

t

n

Z

Z test for normal distributionThe population SD is known

Page 102: Medical statistics Basic concept and applications [Square one]

t-distribution

Heavier tails than the Z distribution

df=No. of observations (sample size)-1

Page 103: Medical statistics Basic concept and applications [Square one]

Degree of freedom (df)

For all sample statistics: variance, SD, we used n-1All the observations in any given sample are free except one= Complementary effect.

Page 104: Medical statistics Basic concept and applications [Square one]

Degree of freedom

7 15

12

16

total =50

restricted

df = n-1

Page 105: Medical statistics Basic concept and applications [Square one]

t-distrib

utio

n

Page 106: Medical statistics Basic concept and applications [Square one]

t-test-steps to determine the statistical difference

When? descriptive statistics: mean ± standard deviation

Number of samples

One sample vs. population mean

Two independent samples

Two dependent (t-paired):Repeated measures Matched pairs

Steps:1- State the hypothesis to be tested: Null (non-directional-two tailed) mean= mean Alternative (unidirectional-one tail) mean ≠ mean 2- Find the calculated t value: using the formulae. 3- Find the degree of freedom: all = n-1 (two sample independent df=n1-1+n2-1 (n1+n2-2).4- Find the P value using the tables of t-distribution.5- Conclude: if < 0.05 = rejection. If > 0.05 the null is accepted.

nSDt / 2

/22

1

21

21 n

SD

n

SD

()

dSE

ddependentt

Page 107: Medical statistics Basic concept and applications [Square one]

t-test (student’s t-test) one sample

nSDt /

Using diabetes data: Is the mean age of diabetics > 65 years?

Statistics

age (years)278

0

67.24

.704

11.743

137.902

Valid

Missing

N

Mean

Std. Error of Mean

Std. Deviation

Variance

H0:μ=65H1:μ≠65

t one sample =67.24-65/SD/√n=3.18

t distribution P=0.002Reject the nullDiabetics are significantly older than 65 years

Page 108: Medical statistics Basic concept and applications [Square one]

One-Sample Test

3.182 277 .002 2.24 .85 3.63age (years)t df Sig. (2-tailed)

MeanDifference Lower Upper

95% ConfidenceInterval of the

Difference

Test Value = 65

P value (two sided)

Degree of freedom

Assuming that the distribution of age is normalPopulation SD is unknown (σ)

Page 109: Medical statistics Basic concept and applications [Square one]

t-test for comparison of means of two independent samples

H0: Smoking has no effect on systolic blood pressureMean S= Mean NS or Mean S-mean NS=0

H1: smoking has an effect Mean S≠ Mean NS or Mean S-Mean NS≠0

Assumptions:•Independent observations (2 samples)•Normally distributed •Equal variances (for the pooled t-test)

Page 110: Medical statistics Basic concept and applications [Square one]

Three formulae

2

22

1

21

21 0

nS

nS

t

(1)(1)

(1)(1)

21

222

2112

2

2

1

2

21

nn

SnSnS

n

S

n

St

p

pp

2

22

1

21

21

nS

nS

t

Standardized

Expected difference if H0 is true

SD of the difference

If SDs are equal

Pooled SD

If SDs are not equal

Decision based on Levene’s test

Page 111: Medical statistics Basic concept and applications [Square one]

Group Statistics

214 153.11 21.995 1.504

64 144.82 20.934 2.617

SMOKINGno

smokers

syst. bloodpressure at start

N Mean Std. DeviationStd. Error

Mean

Independent Samples Test

.006 .936 2.674 276 .008 8.29 3.100 2.188 14.392

2.747 107.982 .007 8.29 3.018 2.308 14.272

Equal variancesassumed

Equal variancesnot assumed

syst. bloodpressure at start

F Sig.

Levene's Test forEquality of Variances

t df Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means

P value <0.05, reject H0Not significant it means equal variances

Two separate t-test

Variances are apparently equal

Page 112: Medical statistics Basic concept and applications [Square one]

Paired t-test

If we have paired data (two repeated measurements on the same subjects) or before and after

If the difference of the paired observations are Normally distributed.

Page 113: Medical statistics Basic concept and applications [Square one]

Paired samples (dependent)

(Paired / dependent 2-sample t-test)

• To compare observations collected form the same group of individuals on 2 separate occasions (dependent observations or paired samples).

• The paired t statistics is calculated by:

- Calculate the difference between the 2 measurements taken on each individual.

- Calculate the mean of the differences.- Calculate the SE of the observed differences.- Under the null hypothesis of no difference or difference

= 0, the paired t statistic takes the form.- t= Mean difference / SE of the difference.

- It has a normal distribution with degrees of freedom = (n-1)

d

d

SE

0-m t

m d

SE d

Page 114: Medical statistics Basic concept and applications [Square one]

Example Four students had the following scores in 2 subsequent

tests. Is there a significant difference in their performance?

NumberNameTest 1Test 2 Dif

1Mike35%67%- 32

2Melanie50%46% 4

3Melissa90%86% 4

4Mitchell78%91%- 13

Mean Dif = -9.25, S D Dif= 17.152, SE Dif= 8.58Calculated Paired t = -9.25/8.58 = -1.078,

df=n-1 = 3

d

d

SE

0-m t

Page 115: Medical statistics Basic concept and applications [Square one]

dfLevel of significance for one-tail test

0.01 0.05 0.02 0.01 0.005

Level of significance for two-tail test

0.20 0.10 0.05 0.02 0.01

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

35

50

3.078 6.314 12.706 31.821 63.657

1.886 2.920 4.303 6.965 9.925

1.638 2.353 3.182 4.541 5.841

1.533 2.132 2.776 3.747 4.604

1.476 2.015 2.571 3.365 4.032

1.440 1.943 2.447 3.143 3.707

1.415 1.895 2.365 2.998 3.499

1.397 1.860 2.306 2.896 3.355

1.383 1.833 2.262 2.821 3.250

1.372 1.812 2.228 2.764 3.169

1.363 1.796 2.201 2.718 3.106

1.356 1.782 2.179 2.681 3.055

1.350 1.771 2.160 2.650 3.012

1.345 1.761 2.145 2.624 2.977

1.341 1.753 2.131 2.602 2.947

1.340 1.746 2.120 2.583 2.921

1.333 1.740 2.110 2.567 2.898

1.330 1.734 2.101 2.552 2.878

1.328 1.729 2.093 2.539 2.861

1.325 1.725 2.086 2.528 2.845

1.323 1.721 2.080 2.518 2.831

1.306 1.690 2.030 2.438 2.724

1.299 1.676 2.009 2.403 2.678

1.282 1.645 1.960 2.326 2.576The P value = 0.20, the null is accepted!

P value

Page 116: Medical statistics Basic concept and applications [Square one]

Conclusion

The observed difference can be encountered in 36 (actual P value =0.362 out of 100 cases. i.e. we accept the null hypothesis of no difference between first and 2nd test.

Page 117: Medical statistics Basic concept and applications [Square one]

Paired Samples Statistics

151.20 278 21.997 1.319

153.83 278 29.076 1.744

syst. blood pressureat start

syst. blood pressureafter 2 years

Pair1

Mean N Std. DeviationStd. Error

Mean

Paired Samples Test

-2.63 17.920 1.075 -4.74 -.51 -2.443 277 .015syst. blood pressureat start - syst. bloodpressure after 2 years

Pair1

Mean Std. DeviationStd. Error

Mean Lower Upper

95% ConfidenceInterval of the

Difference

Paired Differences

t df Sig. (2-tailed)

Page 118: Medical statistics Basic concept and applications [Square one]

Test of significanceInterval/ratio data

Parametric assuming normal distribution

Known Population Variance (σ)One sample Z-test

Z test, rejection limit > ±1.96

n

Z

Unknown Population Variance

Number of samples

One sample vs. population One sample t-test

Two samples

Independent t-test independent

Dependent t-paired test

t-testReject if P ≤ 0.05

Page 119: Medical statistics Basic concept and applications [Square one]

The Chi-Square test χ2

Used for hypothesis testing for categorical variablesMany types depends on design, distribution of variables and objectives of testing

Page 120: Medical statistics Basic concept and applications [Square one]

χ2

Example:

Vaccination against Influenza deceases the risk to get the disease.

Study:

Compare the effectiveness of 5 vaccines with respect to the probability to get influenza.

Comparison will be in respect to a nominal variable (getting influenza: yes or no)

Page 121: Medical statistics Basic concept and applications [Square one]

Effectiveness of Five Vaccines

Vaccines

Influenza No

Influenza Yes

Total

1

2

3

4

5

237

198

245

212

233

43

52

25

48

57

280

250

270

260

290

Total 11252251350

Vaccines

Influenza No

Influenza Yes

Total

1

2

3

4

5

84.6

79.2

90.7

81.5

80.3

15.4

20.8

9.3

18.5

19.7

100

100

100

100

100

Total 83.316.7100

Data cross tabulated 2X5: response variable: Influenza

Frequency %within Vaccines

The probability to get influenza

The null hypothesis states that the probability to get influenza is independent of the vaccinesThe alternative states that a dependency exists

Page 122: Medical statistics Basic concept and applications [Square one]

Effectiveness of Five Vaccines

If H0 is true: The probability to influenza in every group should be the same= the probability in the total population ,

Equal to: 225/1350=0.167 (16.7%)Vaccine 1 used in 280, if H0 is true ,we expect that 16.7% (≈47) to get influenza.

However this is not true

Page 123: Medical statistics Basic concept and applications [Square one]

Expected frequencies

Vaccines Influenza No

Influenza Yes

Total

1-Observe

d

Expected2-

Observed

Expected3-

Observed

Expected4-

Observed

Expected5-

Observed

Expected

237

233.3

198

208.3

245

225.0

212

216.7

233

241.7

43

46.7

52

41.7

25

45.0

48

43.3

57

48.3

280

250

270

260

290

Total 11252251350

For any cell: Expected Frequency= Row total*column total/ grand total

280X225/1350

260*1125/1350

Row total

Column total

Grand total

Page 124: Medical statistics Basic concept and applications [Square one]

Pearson Chi-square test

Calculate the expected frequencies (assuming H0 is true) for all the ten cells.

Calculate Chi square: Of= observed frequencyEf= Expected frequency

f

ff

E

EO 22 ()

Reject H0 if χ2 is large Use the Chi-square distribution

After determining the degree of freedom (df)df= (r-1)*(c-1)

Page 125: Medical statistics Basic concept and applications [Square one]

Chi-square distribution

Page 126: Medical statistics Basic concept and applications [Square one]

Critical values for Chi-squaredf Level of Significance

0.990.900.700.500.300.200.100.050.010.001

1

2

3

4

5

.

.

30

0.00016

0.0201

0.115

0.297

0.554

14.953

0.0158

0.211

0.584

1.064

1.610

20.599

0.148

0.713

1.424

2.195

3.000

25.508

0.455

1.386

2.366

3.357

4.351

29.336

1.074

2.408

3.665

4.878

6.064

33.530

1.642

3.219

4.642

5.989

7.289

36.250

2.706

4.605

6.251

7.779

9.236

40.256

3.841

5.991

7.815

9.488

11.070

43.773

6.635

9.210

11.341

13.277

15.086

50.892

10.827

13.815

16.268

18.465

20.517

59.703

χ2critical= 9.488

Calculated=16.555df=(2-1)(5-1)=4

P=0.002

There is a relation )dependence( between type of vaccine and influenza prevention

Page 127: Medical statistics Basic concept and applications [Square one]

SMOKING * SEX Crosstabulation

90 124 214

42.1% 57.9% 100.0%

55 9 64

85.9% 14.1% 100.0%

145 133 278

52.2% 47.8% 100.0%

Count

% within SMOKING

Count

% within SMOKING

Count

% within SMOKING

no

smokers

SMOKING

Total

male female

SEX

Total

Chi-Square Tests

38.017b 1 .000

36.279 1 .000

41.649 1 .000

.000 .000

37.880 1 .000

278

Pearson Chi-Square

Continuity Correctiona

Likelihood Ratio

Fisher's Exact Test

Linear-by-LinearAssociation

N of Valid Cases

Value dfAsymp. Sig.

(2-sided)Exact Sig.(2-sided)

Exact Sig.(1-sided)

Computed only for a 2x2 tablea.

0 cells (.0%) have expected count less than 5. The minimum expected count is30.62.

b.

At least 80% of cells must have Ef >5

Page 128: Medical statistics Basic concept and applications [Square one]

We can’t use Pearson Chi-square ifthe expected frequency is <5

In this case we use Fisher’s Exact test

Page 129: Medical statistics Basic concept and applications [Square one]

status * SEX Crosstabulation

Count

24 15 39

4 1 5

2 2 4

30 18 48

alive

died from CVD

other cause of death

status

Total

male female

SEX

Total

E f=5*18/48=1.875 (>5)

Expected f=4*30/48=2.5 (>5)

Fisher Exact test provides correction

Page 130: Medical statistics Basic concept and applications [Square one]

Chi-Square Tests

.935a 2 .626

.991 2 .609

.004 1 .951

48

Pearson Chi-Square

Likelihood Ratio

Linear-by-LinearAssociation

N of Valid Cases

Value dfAsymp. Sig.

(2-sided)

4 cells (66.7%) have expected count less than 5. Theminimum expected count is 1.50.

a.

Chi-square is not valid

Page 131: Medical statistics Basic concept and applications [Square one]

Chi-Square Tests

38.017b 1 .000

36.279 1 .000

41.649 1 .000

.000 .000

37.880 1 .000

278

Pearson Chi-Square

Continuity Correctiona

Likelihood Ratio

Fisher's Exact Test

Linear-by-LinearAssociation

N of Valid Cases

Value dfAsymp. Sig.

(2-sided)Exact Sig.(2-sided)

Exact Sig.(1-sided)

Computed only for a 2x2 tablea.

0 cells (.0%) have expected count less than 5. The minimum expected count is30.62.

b.

Page 132: Medical statistics Basic concept and applications [Square one]

McNemar test Paired data in a cross tabulation

Ointment B +No

Total

Ointment A+

No16 10

23 5

26

28

Total 39 1554

54 eczematous persons on both arms use ointment A or B (randomized)

McNemar test only take the discordant pairs into account

Χ2=)23-10(2/23+10df=1

Page 133: Medical statistics Basic concept and applications [Square one]
Page 134: Medical statistics Basic concept and applications [Square one]

Questions

Page 135: Medical statistics Basic concept and applications [Square one]

Thank you