chi square & correlation - university of texas at el pasoutminers.utep.edu/crboehmer/chi square and...
TRANSCRIPT
-
Chi Square & Correlation
-
Nonparametric Test of Chi2
Used when too many assumptions are violated in T-Tests:
Sample size too small to reflect populationData are not continuous and thus not appropriate for parametric tests based on normal distributions.
χ2 is another way of showing that some pattern in data is not created randomly by chance.X2 can be one or two dimensional.X2 deals with the question of whether what we observed is different from what is expected
-
Calculating X2
What would a contingency table look like if no relationship exists between gender and voting for Bush? (i.e. statistical independence)
Male Female
2525
2525Voted for Bush 50
Voted for Kerry 50
1005050
NOTE: INDEPENDENT VARIABLES ON COLUMS AND DEPENDENT ON ROWS
-
Calculating X2
What would a contingency table look like if a perfect relationship exists between gender and voting for Bush?
Male Female
500
050Voted for Bush
Voted for Kerry
-
Calculating the expected value
Nff
f jiij))((^
=
=ijf
^
The expected frequency of the cell in the ith row and jth column
Fi = The total in the ith row marginalFj = The total in the jth column marginalN = The grand total, or sample size for the entire table
Expected Voted for Bush = 50x50 / 100 = 25
-
Nonparametric Test of Chi2
Again, the basic question is what you are observing in some given data created by chance or through some systematic process?
∑ −= EEO 22 )(χ
O= Observed frequency E= Expected frequency
-
Nonparametric Test of Chi2
The null hypothesis we are testing here is that the proportion of occurrences in each category are equal to each other (Ho: B=K). Our research hypothesis is that they are not equal (Ha: B =K).
Given the sample size, how many cases could we expect in each category (n/#categories)? The obtained/critical value estimation will provide a coefficient and a Pr. that the results are random.
-
Let’s do a X2 (50-25)2/25=25 (0 - 25)2 /25=25(0 - 25)2 /25=25(50-25)2 /25=25
X2=100
Male Female
500
050Voted forBush Voted For Kerry
What would X2 be when there is statistical independence?
-
Let’s corroborate with SPSS Chi-Square Tests
.000b 1 1.000
.000 1 1.000
.000 1 1.0001.000 .579
.000 1 1.000
100
Pearson Chi-SquareContinuity Correction a
Likelihood RatioFisher's Exact TestLinear-by-LinearAssociationN of Valid Cases
Value dfAsymp. Sig.
(2-sided)Exact Sig.(2-sided)
Exact Sig.(1-sided)
Computed only for a 2x2 tablea.
0 cells (.0%) have expected count less than 5. The minimum expected count is25.00.
b.
Chi-Square Tests
100.000b 1 .00096.040 1 .000
138.629 1 .000.000 .000
99.000 1 .000
100
Pearson Chi-SquareContinuity Correction a
Likelihood RatioFisher's Exact TestLinear-by-LinearAssociationN of Valid Cases
Value dfAsymp. Sig.
(2-sided)Exact Sig.(2-sided)
Exact Sig.(1-sided)
Computed only for a 2x2 tablea.
0 cells (.0%) have expected count less than 5. The minimum expected count is25.00.
b.
-
Testing for significance
How do we know if the relationship is statistically significant? We need to know the df(df= (R-1) (C-1) )(2-1)(2-1)= 1 We go to the X2distribution to look for the critical value (CV= 3.84)We conclude that the relationship gender and voting is statistically significant.
Male Female
2030
3020Voted forBush Voted forKerry
X2= 4
-
When is X2 appropriate to use?
X2 is perhaps the most widely used statistical technique to analyze nominal and ordinal data Nominal X nominal (gender and voting preferences) Nominal and ordinal (gender and opinion for W)
-
X2 can also be used with larger tables
5515Unfavorable
2010Indifferent
540Favorable
FEMALEMALEOpinion of Bush
45(15.8)(19.4)
30(.72)(.88)
70(8.6) (6.9)65 14580
X2=52.3 Do we reject the null hypothesis?
-
Correlation (Does not mean causation)
We want to know how two variables are related to each otherDoes eating doughnuts affect weight? Does spending more hours studying increase test scores? Correlation means how much two variables overlap with each other
-
Types of Correlations
-1 to 0NegativeIncreasesDecreases
ValuesCorrelationY (effect)X (cause)
0 IndependentDoes not change
IncreaseDecreases
-1 to 0NegativeDecreasesIncreases
0 to 1 PositiveDecreasesDecreases
0 to1 PositiveIncreasesIncreases
-
Conceptualizing Correlation
Measuring Development StrongWeak
GPD POP WEIGHT GDP EDUCATION
Correlation will be associated with what type of validity?
-
Correlation Coefficient
])(][)([ 2222 YYnXXn
YXXYnrxy∑−∑∑−∑
∑∑−∑=
-
Home Value & Square footage
116.5695.96141.9523.9229.15
19.682417.388922.27844.174.72
18.450814.899622.84843.864.78
18.20214.4422.94413.84.79
15.990912.460920.52093.534.53
23.60820.611627.044.545.2
20.622616.160426.31694.025.13
Val * sqftsqft2value2Log sqftLog value
-
Correlation Coefficient
])92.23()6*96.95[(])15.29()6*95.141[()92.23)(15.29()56.116*6(
22 −−
−=xyr
Correlations
1 .778. .068
6 6.778 1.068 .
6 6
Pearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)N
VALUE
SQFT
VALUE SQFT
66.209.278. =
-
Rules of Thumb
Very Weak or no relationship
.0 - .2
Weak.2 - .4
Moderate.4 - .6
Strong .6 - .8
Very Strong.8 - 1.0
General InterpretationSize of correlation coefficient
-
Multiple Correlation Coefficients
Correlations
1 .784** .775** .708**. .000 .000 .000
46 46 46 46.784** 1 .669** .654**.000 . .000 .000
46 46 46 46.775** .669** 1 .895**.000 .000 . .000
46 46 46 46.708** .654** .895** 1.000 .000 .000 .
46 46 46 46
Pearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)N
VALUE
SQFT
BTH
BDR
VALUE SQFT BTH BDR
Correlation is significant at the 0.01 level (2-tailed).**.
-
Limitation of correlation coefficients
They tell us how strong two variables are relatedHowever, r coefficients are limited because they cannot tell anything about:
1. Causation between X and Y 2. Marginal impact of X on Y 3. What percentage of the variation of Y is explained
by X 4. Forecasting Because of the above Ordinary Least Square (OLS) is
most useful
-
Do you have the BLUES?
B for Best (Minimum error) L for Linear (The form of the relationship)
U for Un-bias (does the parameter truly reflect the effect?)
E for Estimator
-
Home value and sq. Feet
SQFT
4.64.44.24.03.83.63.4
VALU
E
5.3
5.2
5.1
5.0
4.9
4.8
4.7
4.6
4.5
εβα ++= XY
Does the above line meet the BLUE criteria?
Chi Square & CorrelationNonparametric Test of Chi2Calculating X2Calculating X2Calculating the expected valueNonparametric Test of Chi2Nonparametric Test of Chi2Let’s do a X2Let’s corroborate with SPSSTesting for significanceWhen is X2 appropriate to use?X2 can also be used with larger tablesCorrelation (Does not mean causation)Types of CorrelationsConceptualizing CorrelationCorrelation CoefficientHome Value & Square footageCorrelation CoefficientRules of ThumbMultiple Correlation CoefficientsLimitation of correlation coefficientsDo you have the BLUES?Home value and sq. Feet