1 g89.2228 lect 10a g89.2228 lecture 10a revisited example: okazaki’s inferences from a survey...

1G89.2228 Lect 10a

G89.2228Lecture 10a

• Revisited Example: Okazaki’s inferences from a survey

• Inferences on correlation• Correlation: Power and effect size• Regression: Expected Y given X• Inference on regression• Return to example

2G89.2228 Lect 10a

Example: Okazaki’s Inferences from a survey

• Does self-construal account for relation of adverse functioning with Asian status?

• Survey of 348 students• Self-reported Interdependence was

correlated .53 with self-reported Fear of Negative Evaluation

• Illustrative plot (simulated) of r=.53

Bivariate Normal With .53 Correlation

-4

-3

-2

-1

0

1

2

3

-4 -2 0 2 4

X

Y

3G89.2228 Lect 10a

Review of Correlation Definitions

• In a population with variables X and Y,

• If we have a sample from the population, we can calculate the product moment estimate:

• To estimate the population value, the (X,Y) pairs should be representative

• The sampling distribution of rXY is not simple. The standard error of r actually depends on knowing .

)/( YXXYXY

)/( YXXYXY sssr

4G89.2228 Lect 10a

Inferences on correlation

• Testing H0: = 0 when either X or Y are normally distributed– A statistic that can be justified from a

regression approach is

– We usually do not compute a standard error for r, because it depends on itself.

• For other inferences on one or more correlations, we use Fisher’s so-called z transformation:

• The standard error of is• Howell shows how CI, and comparisons

of correlations from independent samples can be computed using .

221

2

r

NrtN

r

rzr

1

1ln

2

1

3/1 N

r

r

5G89.2228 Lect 10a

Example: Okazaki’s correlation• Test of H0: =0

r=.53 and N=348

The null hypothesis is rejected.

• Confidence Interval for Compute

Compute

Compute confidence interval

Transform back using

• Note that the resulting confidence interval is asymmetric

63.1153.1/34653.)346( 2 t

59.53.1

53.1ln

2

1

1

1ln

2

1

r

rr

0538.345

1

3

1)(

NrSE

)696,.485(.0538.*96.159.)(96.1 rSEr

)1/()1( 22 rr eer

60,.45.)1(

)1(,

)1(

)1()696,.485(.

696.*2

696.*2

485.*2

485.*2

e

e

e

e

6G89.2228 Lect 10a

Correlation: power and effect size

• Cohen’s rule of thumb for correlation effect sizes (both d above and differences in Fisher’s z transformation) is:

small = .1medium = .3large = .5

• Example (Okazaki, continued):N=348 gives 97% power to detect =.20 with a two tailed test, =.05.

• If =.10, this N would only give 47% power.

• Power and Precision program and Howell’s approximate method give similar results

7G89.2228 Lect 10a

Regression: Expected Y given X• When Y and X are correlated, then the expected

value of Y varies with X. E(Y|X) is not constant for different choices of X.

• We could chop up the plot of Y and X and compute separate means of Y for different value ranges of X

• Often this set of Conditional Expectations of Y given X can be described by a linear model

• Instead of estimating many means of Y|X, we estimate a* and b*, the y-intercept and the slope of the line.

-4

-3

-2

-1

0

1

2

3

-4 -3 -2 -1 0 1 2 3

X

Y Series1

XbaXYE **)|(

8G89.2228 Lect 10a

Regression coefficients as parameters

• If Y and X are known to have a bivariate normal distribution, then the relation between these is known to be linear.

• The conditional distribution of Y given X is expressed with parameters a* and b*.

• a* and b* may also derive meaning from structural models: Y is assumed to be caused by X. This assumption can not be tested, but the strength of the causal path under the model can be assessed.

• In some cases, we do not assume that a* and b* have any deep meaning, or that the true relation between Y and X is exactly linear. Instead, linear regression is used as an approximate predictive model.

9G89.2228 Lect 10a

Estimating regression statistics• b* and a* can be estimated using

ordinary least squares methods. The resulting estimates are:

They minimize the sum of squared residuals, , where is the predicted value of

• If • If• The slope of Y regressed on X is not

generally the same as the slope of X regressed on Y.

• The constant a* is the expected value of Y when X=0.

XbYaa

S

Sr

S

Sbb

X

YXY

X

XY

*

2*

ˆ

ˆ

i

ii YY2ˆ

iY.ˆ : ii XbaYY

.rb then XY yx ss

.rb then 1 XY yx ss

10G89.2228 Lect 10a

Inference on regression• The regression model is:

•

where SY•X is the standard deviation of the residuals

• The estimates, a and b will have normal distributions because of the central limit theorem.

• The standard error of b is based on N-2 degrees of freedom.

1

Ns

ss

X

XYb

.ii YY

22 12

11 rs

N

Nrss YYXY

XbaY **

11G89.2228 Lect 10a

Inference on regression (continued)

• To test H0: b=0, construct a t-test:

• t=b/sb , on N-2 degrees of freedom.

• To construct a 95% CI around the regression parameter, compute

• The t-test will be identical to that for correlation. The CI will be about b*, not , and hence won’t correspond to the one for correlation (calculated using Fisher’s z transformation).

bstb

12G89.2228 Lect 10a

Okazaki: Predicting Fear of Negative from

Interdependence

• From the data in her table 2, we compute– Mean of interdependence=4.49

– Var(interdependence)=.65, SX=.808

– Mean of FNE=38.52

– Var(FNE)=104.08, SY=10.202

• Compute b and ab=rYX(SY/SX)=(.53)(10.2/.81)=6.69

Y = 8.46 + 6.69X + e

• Compute standard errors

Sb=SY•X/[SX (N-1)] = .575

• Test statistic and CIt(N-2)=b/Sb = 6.69/.575 = 11.6

CI: b±(1.96)(Sb) => (5.56,7.82)

8.46 (6.69)4.49 - 38.52 Xb - Ya

8.65 (74.84)).532-(104.08)(1S X•Y

1 g89.2228 lect 10a g89.2228 lecture 10a revisited example: okazaki’s inferences from a survey...

Documents