1 g89.2228 lect 10a g89.2228 lecture 10a revisited example: okazaki’s inferences from a survey...
TRANSCRIPT
1G89.2228 Lect 10a
G89.2228Lecture 10a
• Revisited Example: Okazaki’s inferences from a survey
• Inferences on correlation• Correlation: Power and effect size• Regression: Expected Y given X• Inference on regression• Return to example
2G89.2228 Lect 10a
Example: Okazaki’s Inferences from a survey
• Does self-construal account for relation of adverse functioning with Asian status?
• Survey of 348 students• Self-reported Interdependence was
correlated .53 with self-reported Fear of Negative Evaluation
• Illustrative plot (simulated) of r=.53
Bivariate Normal With .53 Correlation
-4
-3
-2
-1
0
1
2
3
-4 -2 0 2 4
X
Y
3G89.2228 Lect 10a
Review of Correlation Definitions
• In a population with variables X and Y,
• If we have a sample from the population, we can calculate the product moment estimate:
• To estimate the population value, the (X,Y) pairs should be representative
• The sampling distribution of rXY is not simple. The standard error of r actually depends on knowing .
)/( YXXYXY
)/( YXXYXY sssr
4G89.2228 Lect 10a
Inferences on correlation
• Testing H0: = 0 when either X or Y are normally distributed– A statistic that can be justified from a
regression approach is
– We usually do not compute a standard error for r, because it depends on itself.
• For other inferences on one or more correlations, we use Fisher’s so-called z transformation:
• The standard error of is• Howell shows how CI, and comparisons
of correlations from independent samples can be computed using .
221
2
r
NrtN
r
rzr
1
1ln
2
1
3/1 N
r
r
5G89.2228 Lect 10a
Example: Okazaki’s correlation• Test of H0: =0
r=.53 and N=348
The null hypothesis is rejected.
• Confidence Interval for Compute
Compute
Compute confidence interval
Transform back using
• Note that the resulting confidence interval is asymmetric
63.1153.1/34653.)346( 2 t
59.53.1
53.1ln
2
1
1
1ln
2
1
r
rr
0538.345
1
3
1)(
NrSE
)696,.485(.0538.*96.159.)(96.1 rSEr
)1/()1( 22 rr eer
60,.45.)1(
)1(,
)1(
)1()696,.485(.
696.*2
696.*2
485.*2
485.*2
e
e
e
e
6G89.2228 Lect 10a
Correlation: power and effect size
• Cohen’s rule of thumb for correlation effect sizes (both d above and differences in Fisher’s z transformation) is:
small = .1medium = .3large = .5
• Example (Okazaki, continued):N=348 gives 97% power to detect =.20 with a two tailed test, =.05.
• If =.10, this N would only give 47% power.
• Power and Precision program and Howell’s approximate method give similar results
7G89.2228 Lect 10a
Regression: Expected Y given X• When Y and X are correlated, then the expected
value of Y varies with X. E(Y|X) is not constant for different choices of X.
• We could chop up the plot of Y and X and compute separate means of Y for different value ranges of X
• Often this set of Conditional Expectations of Y given X can be described by a linear model
• Instead of estimating many means of Y|X, we estimate a* and b*, the y-intercept and the slope of the line.
-4
-3
-2
-1
0
1
2
3
-4 -3 -2 -1 0 1 2 3
X
Y Series1
XbaXYE **)|(
8G89.2228 Lect 10a
Regression coefficients as parameters
• If Y and X are known to have a bivariate normal distribution, then the relation between these is known to be linear.
• The conditional distribution of Y given X is expressed with parameters a* and b*.
• a* and b* may also derive meaning from structural models: Y is assumed to be caused by X. This assumption can not be tested, but the strength of the causal path under the model can be assessed.
• In some cases, we do not assume that a* and b* have any deep meaning, or that the true relation between Y and X is exactly linear. Instead, linear regression is used as an approximate predictive model.
9G89.2228 Lect 10a
Estimating regression statistics• b* and a* can be estimated using
ordinary least squares methods. The resulting estimates are:
They minimize the sum of squared residuals, , where is the predicted value of
• If • If• The slope of Y regressed on X is not
generally the same as the slope of X regressed on Y.
• The constant a* is the expected value of Y when X=0.
XbYaa
S
Sr
S
Sbb
X
YXY
X
XY
*
2*
ˆ
ˆ
i
ii YY2ˆ
iY.ˆ : ii XbaYY
.rb then XY yx ss
.rb then 1 XY yx ss
10G89.2228 Lect 10a
Inference on regression• The regression model is:
•
where SY•X is the standard deviation of the residuals
• The estimates, a and b will have normal distributions because of the central limit theorem.
• The standard error of b is based on N-2 degrees of freedom.
1
Ns
ss
X
XYb
.ii YY
22 12
11 rs
N
Nrss YYXY
XbaY **
11G89.2228 Lect 10a
Inference on regression (continued)
• To test H0: b=0, construct a t-test:
• t=b/sb , on N-2 degrees of freedom.
• To construct a 95% CI around the regression parameter, compute
• The t-test will be identical to that for correlation. The CI will be about b*, not , and hence won’t correspond to the one for correlation (calculated using Fisher’s z transformation).
bstb
12G89.2228 Lect 10a
Okazaki: Predicting Fear of Negative from
Interdependence
• From the data in her table 2, we compute– Mean of interdependence=4.49
– Var(interdependence)=.65, SX=.808
– Mean of FNE=38.52
– Var(FNE)=104.08, SY=10.202
• Compute b and ab=rYX(SY/SX)=(.53)(10.2/.81)=6.69
Y = 8.46 + 6.69X + e
• Compute standard errors
Sb=SY•X/[SX (N-1)] = .575
• Test statistic and CIt(N-2)=b/Sb = 6.69/.575 = 11.6
CI: b±(1.96)(Sb) => (5.56,7.82)
8.46 (6.69)4.49 - 38.52 Xb - Ya
8.65 (74.84)).532-(104.08)(1S X•Y