chi square & correlation - university of texas at el pasoutminers.utep.edu/crboehmer/chi square and...

23
Chi Square & Correlation

Upload: others

Post on 16-Feb-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

  • Chi Square & Correlation

  • Nonparametric Test of Chi2

    Used when too many assumptions are violated in T-Tests:

    Sample size too small to reflect populationData are not continuous and thus not appropriate for parametric tests based on normal distributions.

    χ2 is another way of showing that some pattern in data is not created randomly by chance.X2 can be one or two dimensional.X2 deals with the question of whether what we observed is different from what is expected

  • Calculating X2

    What would a contingency table look like if no relationship exists between gender and voting for Bush? (i.e. statistical independence)

    Male Female

    2525

    2525Voted for Bush 50

    Voted for Kerry 50

    1005050

    NOTE: INDEPENDENT VARIABLES ON COLUMS AND DEPENDENT ON ROWS

  • Calculating X2

    What would a contingency table look like if a perfect relationship exists between gender and voting for Bush?

    Male Female

    500

    050Voted for Bush

    Voted for Kerry

  • Calculating the expected value

    Nff

    f jiij))((^

    =

    =ijf

    ^

    The expected frequency of the cell in the ith row and jth column

    Fi = The total in the ith row marginalFj = The total in the jth column marginalN = The grand total, or sample size for the entire table

    Expected Voted for Bush = 50x50 / 100 = 25

  • Nonparametric Test of Chi2

    Again, the basic question is what you are observing in some given data created by chance or through some systematic process?

    ∑ −= EEO 22 )(χ

    O= Observed frequency E= Expected frequency

  • Nonparametric Test of Chi2

    The null hypothesis we are testing here is that the proportion of occurrences in each category are equal to each other (Ho: B=K). Our research hypothesis is that they are not equal (Ha: B =K).

    Given the sample size, how many cases could we expect in each category (n/#categories)? The obtained/critical value estimation will provide a coefficient and a Pr. that the results are random.

  • Let’s do a X2 (50-25)2/25=25 (0 - 25)2 /25=25(0 - 25)2 /25=25(50-25)2 /25=25

    X2=100

    Male Female

    500

    050Voted forBush Voted For Kerry

    What would X2 be when there is statistical independence?

  • Let’s corroborate with SPSS Chi-Square Tests

    .000b 1 1.000

    .000 1 1.000

    .000 1 1.0001.000 .579

    .000 1 1.000

    100

    Pearson Chi-SquareContinuity Correction a

    Likelihood RatioFisher's Exact TestLinear-by-LinearAssociationN of Valid Cases

    Value dfAsymp. Sig.

    (2-sided)Exact Sig.(2-sided)

    Exact Sig.(1-sided)

    Computed only for a 2x2 tablea.

    0 cells (.0%) have expected count less than 5. The minimum expected count is25.00.

    b.

    Chi-Square Tests

    100.000b 1 .00096.040 1 .000

    138.629 1 .000.000 .000

    99.000 1 .000

    100

    Pearson Chi-SquareContinuity Correction a

    Likelihood RatioFisher's Exact TestLinear-by-LinearAssociationN of Valid Cases

    Value dfAsymp. Sig.

    (2-sided)Exact Sig.(2-sided)

    Exact Sig.(1-sided)

    Computed only for a 2x2 tablea.

    0 cells (.0%) have expected count less than 5. The minimum expected count is25.00.

    b.

  • Testing for significance

    How do we know if the relationship is statistically significant? We need to know the df(df= (R-1) (C-1) )(2-1)(2-1)= 1 We go to the X2distribution to look for the critical value (CV= 3.84)We conclude that the relationship gender and voting is statistically significant.

    Male Female

    2030

    3020Voted forBush Voted forKerry

    X2= 4

  • When is X2 appropriate to use?

    X2 is perhaps the most widely used statistical technique to analyze nominal and ordinal data Nominal X nominal (gender and voting preferences) Nominal and ordinal (gender and opinion for W)

  • X2 can also be used with larger tables

    5515Unfavorable

    2010Indifferent

    540Favorable

    FEMALEMALEOpinion of Bush

    45(15.8)(19.4)

    30(.72)(.88)

    70(8.6) (6.9)65 14580

    X2=52.3 Do we reject the null hypothesis?

  • Correlation (Does not mean causation)

    We want to know how two variables are related to each otherDoes eating doughnuts affect weight? Does spending more hours studying increase test scores? Correlation means how much two variables overlap with each other

  • Types of Correlations

    -1 to 0NegativeIncreasesDecreases

    ValuesCorrelationY (effect)X (cause)

    0 IndependentDoes not change

    IncreaseDecreases

    -1 to 0NegativeDecreasesIncreases

    0 to 1 PositiveDecreasesDecreases

    0 to1 PositiveIncreasesIncreases

  • Conceptualizing Correlation

    Measuring Development StrongWeak

    GPD POP WEIGHT GDP EDUCATION

    Correlation will be associated with what type of validity?

  • Correlation Coefficient

    ])(][)([ 2222 YYnXXn

    YXXYnrxy∑−∑∑−∑

    ∑∑−∑=

  • Home Value & Square footage

    116.5695.96141.9523.9229.15

    19.682417.388922.27844.174.72

    18.450814.899622.84843.864.78

    18.20214.4422.94413.84.79

    15.990912.460920.52093.534.53

    23.60820.611627.044.545.2

    20.622616.160426.31694.025.13

    Val * sqftsqft2value2Log sqftLog value

  • Correlation Coefficient

    ])92.23()6*96.95[(])15.29()6*95.141[()92.23)(15.29()56.116*6(

    22 −−

    −=xyr

    Correlations

    1 .778. .068

    6 6.778 1.068 .

    6 6

    Pearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)N

    VALUE

    SQFT

    VALUE SQFT

    66.209.278. =

  • Rules of Thumb

    Very Weak or no relationship

    .0 - .2

    Weak.2 - .4

    Moderate.4 - .6

    Strong .6 - .8

    Very Strong.8 - 1.0

    General InterpretationSize of correlation coefficient

  • Multiple Correlation Coefficients

    Correlations

    1 .784** .775** .708**. .000 .000 .000

    46 46 46 46.784** 1 .669** .654**.000 . .000 .000

    46 46 46 46.775** .669** 1 .895**.000 .000 . .000

    46 46 46 46.708** .654** .895** 1.000 .000 .000 .

    46 46 46 46

    Pearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)N

    VALUE

    SQFT

    BTH

    BDR

    VALUE SQFT BTH BDR

    Correlation is significant at the 0.01 level (2-tailed).**.

  • Limitation of correlation coefficients

    They tell us how strong two variables are relatedHowever, r coefficients are limited because they cannot tell anything about:

    1. Causation between X and Y 2. Marginal impact of X on Y 3. What percentage of the variation of Y is explained

    by X 4. Forecasting Because of the above Ordinary Least Square (OLS) is

    most useful

  • Do you have the BLUES?

    B for Best (Minimum error) L for Linear (The form of the relationship)

    U for Un-bias (does the parameter truly reflect the effect?)

    E for Estimator

  • Home value and sq. Feet

    SQFT

    4.64.44.24.03.83.63.4

    VALU

    E

    5.3

    5.2

    5.1

    5.0

    4.9

    4.8

    4.7

    4.6

    4.5

    εβα ++= XY

    Does the above line meet the BLUE criteria?

    Chi Square & CorrelationNonparametric Test of Chi2Calculating X2Calculating X2Calculating the expected valueNonparametric Test of Chi2Nonparametric Test of Chi2Let’s do a X2Let’s corroborate with SPSSTesting for significanceWhen is X2 appropriate to use?X2 can also be used with larger tablesCorrelation (Does not mean causation)Types of CorrelationsConceptualizing CorrelationCorrelation CoefficientHome Value & Square footageCorrelation CoefficientRules of ThumbMultiple Correlation CoefficientsLimitation of correlation coefficientsDo you have the BLUES?Home value and sq. Feet