2. inferential statistics

33
KNOWLEDGE FOR THE BENEFIT OF HUMANITY KNOWLEDGE FOR THE BENEFIT OF HUMANITY BIOSTATISTICS (HFS3283) INFERENTIAL STATISTICS Dr. Dr. Mohd Mohd Razif Razif Shahril Shahril School of Nutrition & Dietetics School of Nutrition & Dietetics Faculty of Health Sciences Faculty of Health Sciences Universiti Universiti Sultan Sultan Zainal Zainal Abidin Abidin 1

Upload: razif-shahril

Post on 13-Jan-2017

2.862 views

Category:

Health & Medicine


1 download

TRANSCRIPT

Page 1: 2. Inferential statistics

KNOWLEDGE FOR THE BENEFIT OF HUMANITYKNOWLEDGE FOR THE BENEFIT OF HUMANITY

BIOSTATISTICS (HFS3283)

INFERENTIAL STATISTICS

Dr.Dr. MohdMohd RazifRazif ShahrilShahril

School of Nutrition & Dietetics School of Nutrition & Dietetics

Faculty of Health SciencesFaculty of Health Sciences

UniversitiUniversiti Sultan Sultan ZainalZainal AbidinAbidin

1

Page 2: 2. Inferential statistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Topic Learning Outcomes At the end of this lecture, students should be able to;

• define inferential statistics.

• explain hypothesis tests, p value and type I and II error

• explain how to interpret confidence interval

2

Page 3: 2. Inferential statistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

3

Page 4: 2. Inferential statistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

What is INFERENTIAL STATISTICS?

4

• It is the Statistical Technique/ method used to

infer the result of the sample (statistic) to the

population (parameter)

PopulationPopulation SampleSample

Page 5: 2. Inferential statistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Two types of inferential statistics

5

• Hypothesis tests

– e.g.

• Comparing 2 means

• Comparing 2 proportions

• Association between one variable and another variable

• Estimation (Confidence interval)

– e.g.

• Estimating a mean (numerical)

• Estimating a proportion (categorical)

1

2

Page 6: 2. Inferential statistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

HypothesisHypothesis

6

• A statement derived from primary research question(s)

• There may be more than one hypothesis to be tested in one study, but the fewer the better.

• It usually comes out of a hunch, an educated guess based on published results or preliminary observations

• In writing a hypothesis; – State hypothesis (hypotheses) clearly and specifically

– Include study and outcome factors

Page 7: 2. Inferential statistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Research question??

7

• A statement derived from primary research

question(s)

• Most studies are concerned with answering the

following questions:

– What is the magnitude of a health problem or health

factor?

– What is the causal relation between one factor (or

factors) and the disease or outcome of interest?

– What is the efficacy of an intervention?

Page 8: 2. Inferential statistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Hypothesis testing

8

• In Hypothesis testing, we answer to a specific

question related to a population parameter (e.g.

population mean nutrition knowledge score)

using a sample statistic (e.g. sample mean

nutrition knowledge score).

Page 9: 2. Inferential statistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Hypothesis testing

9

• In Hypothesis testing, we answer to a specific

question related to a population parameter (e.g.

population mean BMI) using a sample statistic

(e.g. sample mean BMI).

• RQ – Is the BMI of the population different

from 24.5 kg/m2 or not?

• Answer – Yes or No

– Null hypothesis (Ho ; µ = 24.5)

– Alternate hypothesis (Ha ; µ ≠ 24.5)

Page 10: 2. Inferential statistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Null hypothesis

10

• Is a mathematical statement of equality stated

before data collection or data analysis

• After statistical analysis, a p value is calculated.

• Based on this p value, decision is made to

either accept or reject the Ho.

Page 11: 2. Inferential statistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Null hypothesis (cont.)

11

Page 12: 2. Inferential statistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Hypothesis testing (cont.)

12

• RQ - Is the BMI of the population different

from 24.5 kg/m2 or not?

– Ho ; µ = 24.5 mean = 26.1, SD = 4.3

– Ha ; µ ≠ 24.5 n = 130

• This test is called “One sample t test”.

• At the end of the hypothesis testing, we will get a

P value.

– If the P value is < 0.05, we reject the Null Hypothesis

(Ho). And conclude as Ha.

– If the P value is ≥ 0.05, we cannot reject the Null

Hypothesis (Ho). And conclude as Ho.

Page 13: 2. Inferential statistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Hypothesis testing (cont.)

13

• RQ - Is the BMI of the population different

from 24.5 kg/m2 or not?

– Ho ; µ = 24.5 mean = 26.1, SD = 4.3

– Ha ; µ ≠ 24.5 n = 130

• In above example, if we get P = 0.01, we reject the

null hypothesis (Ho), then ...

– We conclude as Alternative Hypothesis (Ha) … “the mean

BMI of the population is different from 24.5 kg/m2”.

– Alternatively, we may report as … “the mean BMI is

significantly different from 24.5 kg/m2”. • Note:

• (1) The second conclusion is more commonly used in the literature.

• (2) These are “statistical conclusion”, not yet “research conclusion”.

Page 14: 2. Inferential statistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Hypothesis testing (cont.)

14

• RQ - Is the BMI of the population different

from 24.5 kg/m2 or not?

– Ho ; µ = 24.5 mean = 26.1, SD = 4.3

– Ha ; µ ≠ 24.5 n = 130

• In above example, if we get P = 0.08, we CANNOT

reject the null hypothesis, then …

– We conclude as … “the mean BMI in the population is NOT

different from 24.5 kg/m2”.

– Alternatively we may report as ... “the mean BMI is NOT

significantly different from 24.5 kg/m2”.

Page 15: 2. Inferential statistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Hypothesis testing (cont.)

15

• RQ - Is the BMI of the population different

from 24.5 kg/m2 or not?

– Ho ; µ = 24.5 mean = 26.1, SD = 4.3

– Ha ; µ ≠ 24.5 n = 130

• In this one sample t test, there are assumptions or

requirements that we need to fulfil/check.

– (1) The sample is selected by using random sampling.

– (2) Observations are independent.

– (3) The data is “normally distributed” (called “Normality

Assumption”).

Page 16: 2. Inferential statistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Steps in Hypothesis Testing

16

Step (1): Step (1): Generate null and alternative hypothesis

Step (2): Step (2): Set the significance level (α)

Step (3): Step (3): Decide which statistical test to use and check

the assumptions of the test

Step (4): Step (4): Perform test statistic and associated p -value

Step (5): Step (5): Make interpretation (based on p value & CI)

Step (6): Step (6): Draw conclusion

Page 17: 2. Inferential statistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

P value?

17

• The p value is the probability (likehood) that the

result or difference was due to chance.

• A p value of 0.05 indicates that a 5% probability

that the difference observed between the groups

was due to chance

Page 18: 2. Inferential statistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

P value? (cont.)

18

• RQ - Is the BMI of the population different

from 24.5 kg/m2 or not?

– Ho ; µ = 24.5 mean = 26.1, SD = 4.3

– Ha ; µ ≠ 24.5 n = 130

• If the P value < 0.05, we reject the Null Hypothesis.

• P value is the probability of error if you reject the Null

Hypothesis and conclude as the Alternative Hypothesis.

• Example: P value = 0.01. It means that …

– There is 1% probability of error in our conclusion, if we conclude

as Alternative Hypothesis (“significantly different”).

– We normally, allow less than 5% error. That is why the cut-off

point for P value is 0.05.

Page 19: 2. Inferential statistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

P value? (cont.)

19

• RQ - Is the BMI of the population different

from 24.5 kg/m2 or not?

– Ho ; µ = 24.5 mean = 26.1, SD = 4.3

– Ha ; µ ≠ 24.5 n = 130

• If the P value < 0.05, we reject the Null Hypothesis.

• P value is the probability of error if you reject the Null

Hypothesis and conclude as the Alternative Hypothesis.

• Example: P value = 0.2. It means that …

– There is 20% probability of error in our conclusion, if we

conclude as Alternative Hypothesis (“significantly different”).

– Therefore, we can’t conclude as it is “significantly different”.

We have to conclude as “the difference is not significant”.

Page 20: 2. Inferential statistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

P value? (cont.)

20

• RQ - Is the BMI of the population different

from 24.5 kg/m2 or not?

– Ho ; µ = 24.5 mean = 26.1, SD = 4.3

– Ha ; µ ≠ 24.5 n = 130

• If the P value < 0.05, we reject the Null Hypothesis.

• It means that we have set the cut-off point at P less than

0.05 to reject the Ho.

• We say this as setting the “Alpha” at 0.05

• Because the type of error that we have been talking

about, is called “Type I error” or “Alpha error”.

Page 21: 2. Inferential statistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

One-sided or Two-sided test?

21

• RQa - Is the BMI of the population different from 24.5 kg/m2 or not? – Ho ; µ = 24.5

– Ha ; µ ≠ 24.5

• RQb - Is the BMI of the population more than 24.5 kgm2 or not? – Ho ; µ ≤ 24.5

– Ha ; µ > 24.5

• RQc - Is the BMI of the population less than 24.5 kgm2 or not? – Ho ; µ ≥ 24.5

– Ha ; µ < 24.5

TwoTwo--sided hypothesis testsided hypothesis test

OneOne--sided hypothesis testsided hypothesis test

OneOne--sided hypothesis testsided hypothesis test

Page 22: 2. Inferential statistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

One-sided or Two-sided test? (cont.)

22

Page 23: 2. Inferential statistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Type I (α) and Type II (ɞ) Error

23

Usually we set;

• Type I (α) error

= 5%

• Type II (ɞ) error

= 20%

Page 24: 2. Inferential statistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Type I (α) and Type II (ɞ) Error (cont.)

24

Power = 1 - ɞ

Usually, power = 1 – 0.2

= 0.8 or 80%

Page 25: 2. Inferential statistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Type I (α) and Type II (ɞ) Error (cont.)

25

Page 26: 2. Inferential statistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Estimation (Confidence Interval)Estimation (Confidence Interval)

26

• Used to express degree of confidence in an

estimate

• Denotes the range of possible values the

estimate may assume, with a certain degree of

assurance

• Gives reliability on an estimate.

• E.g.

– If mean systolic BP level is 118 mmHg and 95% CI is

(110, 125); it means that you can be 95% confident

that the true systolic BP level mean lies between

110mmHg and 125mmHg.

Page 27: 2. Inferential statistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Estimation a Mean

27

• Formula = mean ± (t α/2 x SE)

• Calculation example;

– The systolic BP of 100 students in a class

Mean = 123.4 mmHg

SD = 14.0 mmHg

– The 95% CI = Mean ± t α/2 x (SD/√n)

= 123.4 ± (1.98 x 14/10)

– The 95% CI ranges from 120.6 mmHg to 126.2 mmHg

– We can be 95% sure that this range includes the true

population mean.

Page 28: 2. Inferential statistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Assumption or Requirement

28

Calculation can be made only ...

1) When the sample is selected by using random

sampling

2) When the observations are independent

3) When the data is “normally distributed” (called

“Normality Assumption”)

Page 29: 2. Inferential statistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Estimation a Proportion

29

• Formula = p ± (Z α/2 x SE)

• Calculation example; – The abnormal systolic BP of 100 students in a class

Prevalence = 37%

– The 95% CI = p ± (Z α/2 x SE)

= 0.37 ± (1.96 x √((0.37*0.63)/100)

– The 95% CI ranges from 0.27 to 0.47 (or 27% to 47%)

– We can be 95% sure that this range includes the true population proportion.

Page 30: 2. Inferential statistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Assumption or Requirement

30

Calculation can be made only ...

1) When the sample is selected by using random

sampling

2) When the observations are independent

3) BOTH (n*p) and (n *(1-p)) must be more than 5

count.

Note:

• Recode into 0 and 1;

– 1 should be ‘disease’ or condition of interest.

– 0 should be ‘non-disease’ or the rest.

Page 31: 2. Inferential statistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

P value and Confidence Interval

31

• When using statistics to compare 2 groups, 2

approaches can be used;

– Calculation of p value

– Calculation of confidence interval

• Both p values and confidence interval are

complimentary and should be used together.

Page 32: 2. Inferential statistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Limitation of P value

32

• p values are sensitive to sample size

– Large sample size, smaller p value

• p values are sensitive to the magnitude of

difference between the two group:

– If the difference is small but samples are large, results

can still be statistically significant

– If the difference is large but samples are small, result

would not be statistically significant

Page 33: 2. Inferential statistics

Thank YouThank You

33