chi square goodness of fit

What is a Chi-Square Test of Goodness of Fit?

Questions of goodness of fit have become increasingly important in modern statistics.

Questions of goodness of fit juxtapose complex observed patterns against hypothesized or previously observed patterns

to test overall and specific differences among

them.

Observed Hypothesized Difference


If the difference is small then the FIT IS GOOD




For example:




For example:

51% Females 50% Females 1%


If the difference is BIG then the FIT IS NOT GOOD




For example:




For example:

50% Females 22% Females

18%

Here is an example:

Here is an example:We want to know if a sample we have selected has the national percentages of a certain ethnic groups.

Here is an example:We want to know if a sample we have selected has the national percentages of a certain ethnic groups.

2% of sample is made of

members of this ethnic

group

10% of the population is made of this ethnic group

8% Difference

You will use certain statistical methods to determine if the goodness of fit is

significant or not.


significant or not.

Here is an example:


significant or not.

Here is an example:Problem – The chair of a statistics department suspects that some of her faculty are more popular with students than others.

There are three sections of introductory stats that are taught at the same time in the morning by Professors Cauforek, Kerr, and Rector.

There are three sections of introductory stats that are taught at the same time in the morning by Professors Cauforek, Kerr, and Rector.66 students are planning on enrolling in one of the three classes.

What would you expect the number of enrollees to be in each class if popularity were not an issue?

Professor Cauforek Professor Kerr Professor Rector

22 22 22



22 22 22


This is our expected value.

Now let’s see what was observed.

Now let’s see what was observed.The number who enroll for each class was:

Now let’s see what was observed.The number who enroll for each class was:


31 25 10

We will test the degree to which the observed data...



31 25 10


…fits the expected enrollments.


31 25 10


…fits the expected enrollments.


31 25 10


22 22 22

Here is the formula:

𝑥2=Σ(𝑂−𝐸)2

𝐸

Where:


𝐸

Where:


𝐸

𝒙𝟐= h𝐶 𝑖𝑆𝑞𝑢𝑎𝑟𝑒

Where:


𝐸

𝒙𝟐= h𝐶 𝑖𝑆𝑞𝑢𝑎𝑟𝑒

𝒙𝟐=Σ(𝑂−𝐸)2

𝐸

𝚺=𝑆𝑢𝑚𝑜𝑓

𝚺=𝑆𝑢𝑚𝑜𝑓

𝑥2=𝚺 (𝑂−𝐸)2

𝐸

𝐎=𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑𝑠𝑐𝑜𝑟𝑒


𝑥2=Σ(𝑶−𝐸)2

𝐸


𝑥2=Σ(𝑶−𝐸)2

𝐸


31 25 10

𝑬=𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑠𝑐𝑜𝑟𝑒


𝑥2=Σ(𝑂−𝑬 )2

𝐸


𝑥2=Σ(𝑂−𝑬 )2

𝐸


22 22 22



𝑬


22 22 22

Here is the null-hypothesis:

Here is the null-hypothesis:

There is no significant difference between the expected and the observed number of students

enrolled in three stats professors’ classes.

Now we will compute the value and compare it with the critical value.

Now we will compute the value and compare it with the critical value.• If the value exceeds the critical value, then we

will reject the null-hypothesis.

Now we will compute the value and compare it with the critical value.• If the value exceeds the critical value, then we

will reject the null-hypothesis.• If the value DOES NOT exceed the critical

value, then we will fail to reject the null-hypothesis.

Let’s compute the value.

Let’s compute the value. Professor Cauforek Professor Kerr Professor Rector

Expected 22 22 22

Observed 31 25 10

Let’s compute the value. Professor Cauforek Professor Kerr Professor Rector

Expected 22 22 22

Observed 31 25 10

𝑥2=𝚺 (𝑂−𝐸)2

𝐸


OR


Expected 22 22 22

Observed 31 25 10

𝑥2=𝚺 (𝑂−𝐸)2

𝐸


OR


Expected 22 22 22

Observed 31 25 10

𝑥2=𝚺 (𝑂−𝐸)2

𝐸

𝑥2=(𝑂−𝐸)2

𝐸+(𝑂−𝐸)2

𝐸+

(𝑂−𝐸)2

𝐸


OR

𝑥2=(𝑂−𝐸)2

𝐸+(𝑂−𝐸)2

𝐸+

(𝑂−𝐸)2

𝐸

𝑥2=𝚺 (𝑂−𝐸)2

𝐸


Expected 22 22 22

Observed 31 25 10

Let’s input each professor’s data into the equation.



Expected 22 22 22

Observed 31 25 10



Expected 22 22 22

Observed 31 25 10

𝑥2=(𝟑𝟏−𝐸)2

𝐸+(𝑂−𝐸)2

𝐸+(𝑂−𝐸)2

𝐸



Expected 22 22 22

Observed 31 25 10

𝑥2=(31−𝟐𝟐)2

𝐸+(𝑂−𝐸)2

𝐸+(𝑂−𝐸)2

𝐸



Expected 22 22 22

Observed 31 25 10

𝑥2=(31−22)2

𝟐𝟐+

(𝑂−𝐸)2

𝐸+

(𝑂−𝐸)2

𝐸



Expected 22 22 22

Observed 31 25 10

𝑥2=(31−22)2

22+

(𝟐𝟓−𝐸)2

𝐸+(𝑂−𝐸)2

𝐸



Expected 22 22 22

Observed 31 25 10

𝑥2=(31−22)2

22+

(25−𝟐𝟐)2

𝟐𝟐+(𝑂−𝐸)2

𝐸



Expected 22 22 22

Observed 31 25 10

𝑥2=(31−22)2

22+

(25−22)2

22+(𝟏𝟎−𝐸)2

𝐸



Expected 22 22 22Observed 31 25 10

𝑥2=(31−22)2

22+

(25−22)2

22+(10−𝟐𝟐)2

𝟐𝟐

Now for the calculation:


𝑥2=(31−22)2

22+

(25−22)2

22+(10−22)2

22


𝑥2=(𝟗)2

22+

(25−22)2

22+(10−22)2

22


𝑥2=𝟖𝟏22

+(25−22)2

22+(10−22)2

22


𝑥2=8122

+(𝟑)2

22+(10−22)2

22


𝑥2=8122

+ 𝟗22

+(10−22)2

22


𝑥2=8122

+ 𝟗22

+(−𝟏𝟐)2

22


𝑥2=8122

+922

+𝟏𝟒𝟒22

Convert the fractions into decimals:

𝑥2=8122

+922

+𝟏𝟒𝟒22


𝑥2=8122

+922

+14422


𝑥2=𝟑 .𝟕+922

+14422


𝑥2=3.7+𝟎 .𝟒+14422


𝑥2=3.7+0.4+𝟔 .𝟓

Sum the terms:

𝑥2=3.7+0.4+6.5

Sum the terms:

𝑥2=10.6

As a contrasting example note what the value would be if the observed and expected values were more similar:


Expected 22 22 22

Observed 24 22 20


Expected 22 22 22

Observed 24 22 20

𝑥2=(𝑂−𝐸)2

𝐸+(𝑂−𝐸)2

𝐸+

(𝑂−𝐸)2

𝐸


Expected 22 22 22Observed 24 22 20

𝑥2=(𝑂−𝟐𝟐)2

𝟐𝟐+(𝑂−𝟐𝟐)2

𝟐𝟐+

(𝑂−𝟐𝟐)2

𝟐𝟐


Expected 22 22 22

Observed 24 22 20

𝑥2=(𝟐𝟒−22)2

22+(𝟐𝟐−22)2

22+(𝟐𝟎−22)2

22


Expected 22 22 22

Observed 24 22 20

𝑥2=(𝟐)2

22+

(𝟎)2

22+

(−𝟐)2

22


Expected 22 22 22

Observed 24 22 20

𝑥2=𝟒22

+𝟎22

+𝟒22


Expected 22 22 22

Observed 24 22 20

𝑥2=𝟎 .𝟐+𝟎 .𝟎+𝟎 .𝟐


Expected 22 22 22

Observed 24 22 20

𝑥2=𝟎 .𝟒

So the moral of the story is that the closer the expected and observed values are to one another, the smaller the Chi-square value or the greater the goodness of fit (as seen below).



Expected 22 22 22

Observed 31 25 10



Expected 22 22 22

Observed 31 25 10

𝑥2=𝟏𝟎 .𝟔

On the other hand, the farther the expected and observed values are from one another the smaller the Chi-square value or the greater the goodness of fit (as seen below).



Expected 22 22 22

Observed 31 25 10



Expected 22 22 22

Observed 31 25 10

𝑥2=𝟏𝟎 .𝟔

Now we determine if a of 10.6 exceeds the critical for terms.

To calculate the critical we first must determine the degrees of freedom as well as set the probability level.

To calculate the critical we first must determine the degrees of freedom as well as set the probability level.The probability or alpha level means the probability of a type 1 error we are willing to live with (i.e., this is the probability of being wrong when we reject the null hypothesis).

To calculate the critical we first must determine the degrees of freedom as well as set the probability level.The probability or alpha level means the probability of a type 1 error we are willing to live with (i.e., this is the probability of being wrong when we reject the null hypothesis). Generally this value is 0.5 which is like saying we are willing to be wrong 5 out of 100 times (0.05) before we will reject the null-hypothesis.

Degrees of Freedom are calculated by taking the number of groups and subtracting them by 1. (Three groups minus 1 = 2)

We now have all of the information we need to determine the critical .

We now have all of the information we need to determine the critical .We go to the Chi-Square Distribution Table and locate the degrees of freedom.

We now have all of the information we need to determine the critical .We go to the Chi-Square Distribution Table and locate the degrees of freedom.

df 0.100 0.050 0.025 1 2.71 3.84 5.02 2 4.61 5.99 7.38 3 6.25 7.82 9.35 4 7.78 9.49 11.14 5 9.24 11.07 12.83 6 10.64 12.59 14.45 7 12.02 14.07 16.10 8 13.36 15.51 17.54 9 14.68 16.92 19.20 … … … …

We now have all of the information we need to determine the critical .We go to the Chi-Square Distribution Table and locate the degrees of freedom.And then we locate the probability or alpha level:

df 0.100 0.050 0.025 1 2.71 3.84 5.02 2 4.61 5.99 7.38 3 6.25 7.82 9.35 4 7.78 9.49 11.14 5 9.24 11.07 12.83 6 10.64 12.59 14.45 7 12.02 14.07 16.10 8 13.36 15.51 17.54 9 14.68 16.92 19.20 … … … …


df 0.100 0.050 0.025 1 2.71 3.84 5.02 2 4.61 5.99 7.38 3 6.25 7.82 9.35 4 7.78 9.49 11.14 5 9.24 11.07 12.83 6 10.64 12.59 14.45 7 12.02 14.07 16.10 8 13.36 15.51 17.54 9 14.68 16.92 19.20 … … … …

Where these two values intersect in the table we find the critical .

df 0.100 0.050 0.025 1 2.71 3.84 5.02 2 4.61 5.99 7.38 3 6.25 7.82 9.35 4 7.78 9.49 11.14 5 9.24 11.07 12.83 6 10.64 12.59 14.45 7 12.02 14.07 16.10 8 13.36 15.51 17.54 9 14.68 16.92 19.20 … … … …



Since the chi-square goodness of fit value (10.6) exceeds the critical (5.99) we will reject the null hypothesis:


There actually is a significant difference.



In summary,

In summary,Questions of goodness of fit juxtapose observed patterns against hypothesized to test overall and specific differences among them.

chi square goodness of fit

Education

hypothesized differenceif

observed number

critical value

expected value

goodness of fit issignificant

significant difference

morningby professors

number of enrolleesto