probability & statistical inference lecture 8 msc in computing (data analytics)

Post on 19-Dec-2015

216 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Probability & Statistical Inference Lecture 8

MSc in Computing (Data Analytics)

Introduction In the previous lecture we were concerned

with the analysis of data where we compared the sample means.

Frequently data contains more that two samples, they may compare several treatments.

In this lecture we introduce statistical analysis that allows us compare the mean of more that two samples. The method is called ‘Analysis of Variance ‘ or AVOVA for short.

Total Sum of SquaresData set:

14, 12, 10, 6 ,4, 2Group A:

6 ,4, 2Group B:

14, 12, 10Overall Mean : 8Total Sum of Squares:

SST= (14-8)2 + (12-8)2 +

(10-8)2 + (6-8)2 + (4-8)2

+ (2-8)2 =112

Between Group Variation Sum of Squares of

the Model:SSm= na(µ - µa)2 + nb(µ

- µb)2

=3*(8-4)2 + 3*(8-12)2

=96

Within Group Variation Sum of Squares of the

Error:

SSe=

= (14-12)2 + (12-12)2 + (10-12)2 + (8-6)2 + (6-6)2 + (6-4)2 +

= 16

2

1 1

__

)(

k

i

n

jjij xx

Structure of the Data

Group Observation Total Mean

1 x11 x12 .......... x1n x1

2 x21 x22.......... x2n x2

.

.

...........

a xa1 xa2 .......... xan xa

Total

1x

2x

ax

x

ANOVA Table

Source Degrees of Freedom

Sum Of Squares Mean Square

F- Stat

Model a - 1SSM /(a-1)

MSM / MSE

Error n-aSSE /(n-a)

Total n-1SST /(n-1)

2

1

)( xxn

ii

a

j

jj xxn1

2)(

2

1 1

__

)(

a

i

n

jjij xx

Where : n is the sample size and a is the number of groups

ANOVA Table – Original Example

Source Degrees of

Freedom

Sum Of Squares Mean Square

F- Stat

Model 2 - 1 = 1 96 96 24

Error 6 – 2 = 4 16 4

Total 6 – 1 = 5 112

Where : n is the sample size and k is the number of groups

Model Assumptions Independence of observations within and

between samples normality of sampling distribution equal variance - This is also called the

homoscedasticity assumption

The ANOVA Equation We can describe the observations in the

above table usint the following equation:

nj

aiY ijiij ,......,2,1

,......,2,1

Where : n is the sample size and k is the number of groups

ANOVA Hypotheses

We wish to test the hypotheses:

The analysis of variance partitions the total variability into two parts.

Example

Graphical Display of Data

Figure 13-1 (a) Box plots of hardwood concentration data. (b) Display of the model in Equation 13-1 for the completely randomized single-factor experiment

Example We can use ANOVA to test the hypotheses

that different hardwood concentrations do not affect the mean tensile strength of the paper. The hypotheses are:

The ANOVA table is below:

Example The p-value is less than 0.05 therefore the H0

can be rejected and we can conclude that at least one of the hardwood concentrations affects the mean tensile strength of the paper.

Demo

Confidence Interval about the mean

For 20% hardwood, the resulting confidence interval on the mean is

Confidence Interval about on the difference of two treatments

For the hardwood concentration example,

An Unbalanced Experiment

Multiple Comparisons Following the ANOVA The least significant difference (LSD) is

If the sample sizes are different in each treatment:

Example: Multi-comparison Test

Example: Multi-comparison Test

top related