association

22
Association Predicting One Variable from Another

Upload: yanni

Post on 24-Feb-2016

27 views

Category:

Documents


0 download

DESCRIPTION

Association. Predicting One Variable from Another. Correlation. Usually refers to Pearson’s r computed on two interval/ratio scale variables. It measures the degree to which variance in one variable is “explained” by a second variable - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Association

Association

Predicting One Variable from Another

Page 2: Association

Correlation• Usually refers to Pearson’s r

computed on two interval/ratio scale variables.

• It measures the degree to which variance in one variable is “explained” by a second variable

• It measures the strength of a linear relationship between the variables

Page 3: Association

Definition of r

𝑟= σሺ𝑥𝑖 − 𝑥ҧሻ(𝑦𝑖 − 𝑦ത)σሺ𝑥𝑖 − 𝑥ҧሻ2 σ(𝑦𝑖 − 𝑦ത)2

Page 4: Association

Properties of r• r is symmetrical and varies from -1

to +1• 0 indicates no correlation or

relationship• ±1 indicates a perfect correlation

(knowledge of one variable makes it possible to predict the second one without any error).

Page 5: Association

Properties of r2

• r2 is symmetrical and varies from 0 to 1

• r2 is the proportion of the variability in one variable that is “explained by” the other variable

• cor.test(x, y, method=“pearson”)• cor(x, y, method=“pearson”)

Page 6: Association
Page 7: Association
Page 8: Association
Page 9: Association
Page 10: Association
Page 11: Association
Page 12: Association

Spearman’s rho• For rank/ordinal data. • Pearson correlation computed on

ranks• If Spearman coefficient is larger

than Pearson, it may indicate a non-linear relationship

• Ties make it difficult to compute p values

Page 13: Association

Kendall’s tau• For rank/ordinal data• Evaluate pairs of observations (xi,

yi) and (xj, yj)• Concordant – (xi > xj) and (yi > yj)

OR (xi < xj) and (yi < yj)• Discordant – (xi > xj) and (yi < yj)

OR (xi < xj) and (yi > yj)

Page 14: Association

Kendall’s tau-a

𝜏𝑎 = ሺ𝑁𝑜.𝐶𝑜𝑛𝑐𝑜𝑟𝑑𝑎𝑛𝑡ሻ− (𝑁𝑜.𝐷𝑖𝑠𝑐𝑜𝑟𝑑𝑎𝑛𝑡)12𝑛(𝑛− 1)

Page 15: Association

Kendall’s tau b• Divide by total number of pairs

adjusted for all ties

𝜏𝑏 = ሺ𝑁𝑜.𝐶𝑜𝑛𝑐𝑜𝑟𝑑𝑎𝑛𝑡ሻ− (𝑁𝑜.𝐷𝑖𝑠𝑐𝑜𝑟𝑑𝑎𝑛𝑡)ට൬𝑛(𝑛− 1)2 − σ𝑡𝑖(𝑡𝑖 − 1)2 ൰൬𝑛(𝑛− 1)2 − σ𝑢𝑖(𝑢𝑖 − 1)2 ൰

Page 16: Association

Kendall’s tau c• For grouped (tabled data) where

the table is not square (rows ≠ columns)

𝜏𝑐 = ሺ𝑁𝑜.𝐶𝑜𝑛𝑐𝑜𝑟𝑑𝑎𝑛𝑡ሻ− (𝑁𝑜.𝐷𝑖𝑠𝑐𝑜𝑟𝑑𝑎𝑛𝑡)𝑁22 minሺ𝑟,𝑐ሻ− 1min(𝑟,𝑐) ൨

Page 17: Association

Nominal Measures• Measures based on Chi-Square:

– Phi coefficient– Cramer’s V– Contingency coefficient– Odds ratio

Page 18: Association

Phi and Cramer’s V• Phi ranges from 0 to 1 in a 2x2

table but can exceed 1 in larger tables. Cramer’s V adds a correction to keep the maximum value at 1 or less:

𝜙 = ඨ𝜒2𝑁 𝑉= ඨ 𝜒2𝑁 × Min(𝑟− 1,𝑐− 1)

Page 19: Association

Contingency Coefficient• Ranges from 0 to <1 depending on

the number of rows and columns with 1 indicating a high relationship and 0 indicating no relationship

𝐶= ඨ 𝜒2(𝜒2 + 𝑁)

Page 20: Association

Odds Ratio• For 2 x 2 tables it shows the

relative odds between the two variables

a bc d 𝛼= 𝑎/𝑐𝑏/𝑑= 𝑎𝑑𝑏𝑐

Page 21: Association

> Table <- xtabs(~Sex+Goods, data=EWG2)> Table GoodsSex Absent Present Female 38 28 Male 16 30> ChiSq <- chisq.test(Table)> ChiSq

Pearson's Chi-squared test with Yates' continuity correction

data: Table X-squared = 4.7644, df = 1, p-value = 0.02905

Page 22: Association

library(vcd)> assocstats(Table) X^2 df P(> X^2)Likelihood Ratio 5.7073 1 0.016894Pearson 5.6404 1 0.017552

Phi-Coefficient : 0.224 Contingency Coeff.: 0.219 Cramer's V : 0.224 > cor(as.numeric(EWG2$Sex), as.numeric(EWG2$Goods), use="complete.obs")[1] 0.2244111> oddsratio(Table, log=FALSE)[1] 2.544643