chi2 anova
DESCRIPTION
TRANSCRIPT
![Page 1: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/1.jpg)
Cross-Tabs Continued
Andrew Martin
PS 372
University of Kentucky
![Page 2: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/2.jpg)
Statistical Independence
Statistical independence is a property of two variables in which the probability that an observation is in a particular category of one variable and a particular category of the other variable equals the simple or marginal probability of being in those categories.
Contrary to other statistical measures discussed in class, statistical independence indicators test for a lack of a relationship between two variables.
![Page 3: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/3.jpg)
Statistical Independence
Let us assume two nominal variables, X and Y. The values for these variables are as follows:
X: a, b, c, ...
Y: r, s, t, ...
![Page 4: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/4.jpg)
Statistical Independence
P(X=a) stands for the probability a randomly selected case has property or value a on variable
X.
P(Y=r) stands for the probability a randomly selected case has property or value r on variable Y
P(X=a, Y=r) stands for the joint probability that a randomly selected observation has both property a
and property r simultaneously.
![Page 5: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/5.jpg)
Statistical Independence
If X and Y are statistically independent:
P(X=a, Y=r) = [P(X=a)][P(Y=r)] for all a and r.
![Page 6: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/6.jpg)
Statistical Independence
![Page 7: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/7.jpg)
If gender and turnout are independent:
Total obs in column m * Total obs in row v N = mv
![Page 8: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/8.jpg)
Statistical Independence
Total obs in column m * Total obs in row v N = mv
210 * 100300 = 70
70 is the expected frequency. Because the observed and expected frequencies are the same, the
variables are independent.
![Page 9: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/9.jpg)
150 * 150300 = 75
![Page 10: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/10.jpg)
Here, the relationship is not independent (or dependent) because 75 (expected frequency) is
less than 100 (observed frequency).
![Page 11: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/11.jpg)
Testing for Independence
How do we test for independence for an entire cross-tabulation table?
A statistic used to test the statistical significance of a relationship in a cross-tabulation table is a
chi-square test (χ2).
![Page 12: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/12.jpg)
Chi-Square Statistic
The chi-square statistic essentially compares an observed result—the table produced by the data—with a hypothetical table that would occur if, in the population, the variables were statistically independent.
![Page 13: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/13.jpg)
How is the chi-square statistic
calculated?
The chi-square test is set up just like a hypothesis test. The observed chi-square value is compared to the critical value for a certain critical region.
A statistic is calculated for each cell of the cross-tabulation and is similar to the independence statistic.
![Page 14: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/14.jpg)
How is the chi-square statistic
calculated?
(Observed frequency – expected frequency)2
![Page 15: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/15.jpg)
Chi-Square Test
➲ The null hypothesis is statistical
independence between X and Y.
➲ H0: X, Y Independent
➲ The alternative hypothesis is X and Y
are not independent.
➲ HA: X, Y Dependent
![Page 16: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/16.jpg)
Chi-Square Test
➲ The chi-square is a family of distributions,
each of which depends on degrees of
freedom. The degrees of freedom equals
the number of rows minus one times the
number of columns minus one. (r-1)(c-1)
➲ Level of significance: The probability
(α) of incorrectly rejecting a true null
hypothesis.
![Page 17: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/17.jpg)
Chi-Square Test
➲ Critical value: The chi-square test is
always a one-tail test. Choose the
critical value of chi-square from a
tabulation to make the critical region
(the region of rejection) equal to α.➲ (JRM: Appendix C, pg. 577)
![Page 18: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/18.jpg)
Chi-Square Test
➲ The observed chi-2 is the sum of the squared differences between observed and expected frequencies divided by the expected frequency.
➲ If χ2obs
≥ χ2crit.,
reject null hypothesis. Otherwise, do not reject.
![Page 19: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/19.jpg)
![Page 20: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/20.jpg)
Chi-Square Test
➲ Let's assume we want to test the
relationship at the .01 level.
➲ The observed χ2 is 62.21.➲ The degrees of freedom is (5-1)(2-1) = 4.➲ The critical χ2 is 13.28.➲ Since 62.21 > 13.28, we can reject the null of
an independent relationship.➲ Y (attitudes toward gun control) is dependent
on X (gender).
![Page 21: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/21.jpg)
Chi-Square Test
➲ The χ2 statistic works for dependent variables that are ordinal or nominal measures, but another statistic is more appropriate for interval- and ratio-level data.
![Page 22: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/22.jpg)
Analysis of Variance
For quantitative (or interval- and ratio-level ) data the analysis of variance is appropriate.
Analysis of variance aka ANOVA.
The independent variable, however, is generally still nominal or ordinal.
![Page 23: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/23.jpg)
ANOVA tells political
scientists ...
(1) if there are any differences among the means
(2) which specific means differ and by how much
(3) whether the observed differences in Y could have arisen by chance or whether they reflect real variation among the categories or groups in X
![Page 24: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/24.jpg)
Two important concepts
Effect size—The difference between one mean and the other.
Difference of means test—The larger the difference of means, the more likely the
difference is not due to chance and is instead due to a relationship between the independent and
dependent variables.
![Page 25: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/25.jpg)
Setting up an Example
Suppose you want to test the effect of negative political ads on intention of voting in the next election.
You set up a control group and a test group. Each group watches a newscast, but the test group watches negative TV ads during the commercial breaks. The control group watches a newscast without a campaign ad.
You create a pre- and post-test of both groups to compare the effects of both ads.
![Page 26: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/26.jpg)
Difference of the means
Effect = Mean (test group) – Mean (control group)
![Page 27: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/27.jpg)
![Page 28: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/28.jpg)
Difference of the means
Although different statistics use different formulas, each means test has two identical properties:
(1) The numerator indicates the difference of the means
(2) The denominator indicates the standard error of the difference of the means.
![Page 29: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/29.jpg)
Difference of the means
A means test will compare the means of two different samples. The larger the N for both samples, the greater confidence that the observed difference in the sample (D) will correctly estimate the population difference (∆).
![Page 30: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/30.jpg)
Difference of the means
In abstract terms:Mean of test group – mean of control group
Std. Error (test) + Std. error (control)
In concrete terms:
Mean (Ads) – Mean (No ads)Std. Error (Ads) + Std. error (No ads)
![Page 31: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/31.jpg)
Hypothesis Test of Means
Difference of the means tests the null hypothesis that there is no difference between the means.
You can basically test the significance of a means difference by employing a hypothesis test or
calculating a confidence interval.
![Page 32: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/32.jpg)
Small-Sample Test of DoM
Let's suppose we are looking at a two samples: both measure the level of democracy in a country, but one is a sample of developed countries and another is a sample of developing countries.
We want to test whether the level of economic development impacts the level of openness in a democracy. Specifically, we want to test whether the population differences are 0.
![Page 33: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/33.jpg)
Small-Sample Test of DoM
Two small samples, so we must use the t distribution and calculate degrees of freedom .
When there are two samples, degrees of freedom equals N
1 (first sample) + N
2 (second sample) – 2.
![Page 34: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/34.jpg)
![Page 35: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/35.jpg)
The standard error for the difference of means is .144.
![Page 36: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/36.jpg)
ANOVA
ANOVA or analysis of variance allows us to expand on previous methods.
This procedure treats the observations in categories of the explanatory or independent variable as independent samples from populations.
This makes it possible to test hypotheses such as H0 = μ1 = μ2 =μ3
![Page 37: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/37.jpg)
Variation
Three types
Total variation—A quantitative measure of the variation in a variable, determined by summing the squared deviation of each observation from
the mean.
![Page 38: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/38.jpg)
Variation
Explained variation—That portion of the total variation in a dependent variable explained by the
variance in the independent variable.
Unexplained variation—That portion of the total variation in a dependent variable that is not
accounted for by the variation in the independent variable(s).
![Page 39: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/39.jpg)
ANOVA
Total variance = Within variance + Between variance
Within variance is the unexplained variance
Between variance is the explained variance
![Page 40: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/40.jpg)
ANOVA
Explained variance refers to the fact that some of the observed differences seems to be due to
“membership in” or “having a property of” one category of X.
On average, the A's differ fro the B's. Knowing this characteristics can help us tell the value of Y.
![Page 41: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/41.jpg)
ANOVA
Percent explained = (between/total) X 100
Ex: Percent explained = (0/total) = 0
![Page 42: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/42.jpg)
ANOVA
ANOVA involves quantifying the types of variation and using the numbers to make
inferences.
The standard measure of variation is the sum of squares, which is a total of squared deviations
about the mean.
![Page 43: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/43.jpg)
ANOVA
TSS = BSS + WSS
TSS = Total sum of squares
BSS =Between mean variability
WSS = Within group variability
Percent explained – (BSS/WSS) X 100
![Page 44: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/44.jpg)
ANOVA
The percent of variation explained is called eta-squared (η2).
It varies between 0 and 1 like any proportion.
0 means the independent variable explains nothing about the dependent variable.
1 means the independent variable explains all variation in the dependent variable.
![Page 45: Chi2 Anova](https://reader035.vdocument.in/reader035/viewer/2022081414/54c700d04a7959f8558b45c4/html5/thumbnails/45.jpg)
ANOVA
Often this statistic is explained as follows:
X explains 60 percent of the variation in Y and hence is an important explanatory factor.
(if η2 ) = .6