8 the t-test
Post on 10-Feb-2016
223 Views
Preview:
DESCRIPTION
TRANSCRIPT
The t-test
Inferences about Population Means
Questions• What is the main use of the t-test?• How is the distribution of t related to the unit
normal?• When would we use a t-test instead of a z-test?
Why might we prefer one to the other?• What are the chief varieties or forms of the t-
test? • What is the standard error of the difference
between means? What are the factors that influence its size?
More Questions• Identify the appropriate to version of t
to use for a given design.• Compute and interpret t-tests
appropriately.• Given that
construct a rejection region. Draw a picture to illustrate.
01.2;49;14;75:;75: )48,05(.10 tNsHH y
Background• The t-test is used to test hypotheses
about means when the population variance is unknown (the usual case). Closely related to z, the unit normal.
• Developed by Gossett for the quality control of beer.
• Comes in 3 varieties:• Single sample, independent samples, and
dependent samples.
What kind of t is it?• Single sample t – we have only 1 group; want
to test against a hypothetical mean.• Independent samples t – we have 2 means, 2
groups; no relation between groups, e.g., people randomly assigned to a single group.
• Dependent t – we have two means. Either same people in both groups, or people are related, e.g., husband-wife, left hand-right hand, hospital patient and visitor.
Single-sample z test• For large samples (N>100) can use z to
test hypotheses about means.
• Suppose
• Then
• If
MM est
Xz
.)(
N
NXX
Nsest X
M1
)(
.
2
200;5;10:;10: 10 NsHH X
35.14.14
52005.
Nsest X
M
05.96.183.2;83.235.
)1011(11
pzX
The t DistributionWe use t when the population variance is unknown (the usual case) and sample size is small (N<100, the usual case). If you use a stat package for testing hypotheses about means, you will use t.
The t distribution is a short, fat relative of the normal. The shape of t depends on its df. As N becomes infinitely large, t becomes normal.
Degrees of FreedomFor the t distribution, degrees of freedom are always a simple function of the sample size, e.g., (N-1).
One way of explaining df is that if we know the total or mean, and all but one score, the last (N-1) score is not free to vary. It is fixed by the other scores. 4+3+2+X = 10. X=1.
Single-sample t-testWith a small sample size, we compute the same numbers as we did for z, but we compare them to the t distribution instead of the z distribution.
25;5;10:;10: 10 NsHH X
1255.
Nsest X
M 11
)1011(11
tX
064.2)24,05(. t 1<2.064, n.s.
Interval = ]064.13,936.8[)1(064.211ˆ
MtX
Interval is about 9 to 13 and contains 10, so n.s.
(c.f. z=1.96)
ReviewHow are the distributions of z and t related? Given that
construct a rejection region. Draw a picture to illustrate.
01.2;49;14;75:;75: )48,05(.10 tNsHH y
Difference Between Means (1)• Most studies have at least 2 groups
(e.g., M vs. F, Exp vs. Control)• If we want to know diff in population
means, best guess is diff in sample means.
• Unbiased:• Variance of the Difference:• Standard Error:
22
2121 )var( MMyy
212121 )()()( yEyEyyE
22
21 MMdiff
Difference Between Means (2)• We can estimate the standard error of
the difference between means.
• For large samples, can use z
22
21 ... MMdiff estestest
diffestXX
diffz )()( 2121
3;100;12
2;100;10
0:;0:
222
111
211210
SDNX
SDNX
HH
36.10013
1009
1004. diffest
05.;56.536.2
36.0)1210( pzdiff
Independent Samples t (1)• Looks just like z:• df=N1-1+N2-1=N1+N2-2• If SDs are equal, estimate is:
diffestyy
difft )()( 2121
21
2
2
2
1
2 11NNNNdiff
Pooled variance estimate is weighted average:)]2/(1/[])1()1[( 21
222
211
2 NNsNsNPooled Standard Error of the Difference (computed):
21
21
21
222
211
2)1()1(.
NNNN
NNsNsNest diff
Independent Samples t (2)
21
21
21
222
211
2)1()1(.
NNNN
NNsNsNest diff
diffestyy
difft )()( 2121
7;83.5;20
5;7;18
0:;0:
2222
1211
211210
Nsy
Nsy
HH
47.13512
275)83.5(6)7(4.
diffest
..;36.147.12
47.10)2018( sntdiff
tcrit = t(.05,10)=2.23
ReviewWhat is the standard error of the
difference between means? What are the factors that influence its size?
Describe a design (what IV? What DV?) where it makes sense to use the independent samples t test.
Dependent t (1)Observations come in pairs. Brother, sister, repeated measure.
),cov(2 212
22
12 yyMMdiff
Problem solved by finding diffs between pairs Di=yi1-yi2.
1)( 2
2
N
DDs i
D Nsest D
MD .ND
D i)(
MDestDEDt
.)(
df=N(pairs)-1
Dependent t (2)Brother Sister5 77 83 3
5y 6y
Diff2 11 00 1
1D
58.3/1. MDest
72.158.1
.)(
MDestDEDt
11
)( 2
N
DDsD
2)( DD
Assumptions• The t-test is based on assumptions of
normality and homogeneity of variance.• You can test for both these (make sure
you learn the SAS methods).• As long as the samples in each group
are large and nearly equal, the t-test is robust, that is, still good, even tho assumptions are not met.
Review• Describe a design where it makes sense
to use a single-sample t.• Describe a design where it makes sense
to use a dependent samples t.
Strength of Association (1)• Scientific purpose is to predict or
explain variation.• Our variable Y has some variance that
we would like to account for. There are statistical indexes of how well our IV accounts for variance in the DV. These are measures of how strongly or closely associated our Ivs and DVs are.
• Variance accounted for:
2
221
2
2|
22
4)(
YY
XYY
Strength of Association (2)• How much of variance in Y is
associated with the IV? 2
221
2
2|
22
4)(
YY
XYY
6420-2-4
0.4
0.3
0.2
0.1
0.0
Compare the 1st (left-most) curve with the curve in the middle and the one on the right.
In each case, how much of the variance in Y is associated with the IV, group membership? More in the second comparison. As mean diff gets big, so does variance acct.
Association & Significance• Power increases with association (effect
size) and sample size.• Effect size:• Significance = effect size X sample
size.
pXXd /)( 21
21
2
21
11
)(
NN
XXt
p
Increasing sample size does not increase effect size (strength of association). It decreases the standard error so power is greater, |t| is larger.
N
Xt
2
)(
(independent samples)
(single sample)
Ndt
Estimating Power (1)• If the null is false, the statistic is no
longer distributed as t, but rather as noncentral t. This makes power computation difficult.
• Howell introduces the noncentrality parameter delta to use for estimating power. For the one-sample t,
nd Recall the relations between t and d on the previous slide
Estimating Power (2)• Suppose (Howell, p. 231) that we have
25 people, a sample mean of 105, and a hypothesized mean and SD of 100 and 15, respectively. Then
33.3/115
100105
d
38.65.12533.
powernd
Howell presents an appendix where delta is related to power. For power = .8, alpha = .05, delta must be 2.80. To solve for N, we compute:
91.7148.833.
8.2; 222
dnnd
Estimating Power (3)• Dependent t can be cast as a single
sample t using difference scores.• Independent t. To use Howell’s
method, the result is n per group, so double it. Suppose d = .5 (medium effect) and n =25 per group.
77.15.125.22550.
2
nd From Howell’s appendix, the value of delta of 1.77 with alpha = .05 results in power of .43. For a power of .8, we need delta = 2.80
72.625.8.222
22
dn
Need 63 per group.
SAS Proc Power – single sample exampleproc power; onesamplemeans test=t nullmean = 100 mean = 105 stddev = 15 power = .8 ntotal = . ; run;
The POWER Procedure One-sample t Test for Mean Fixed Scenario Elements Distribution Normal Method Exact Null Mean 100 Mean 105 Standard Deviation 15 Nominal Power 0.8 Number of Sides 2 Alpha 0.05 Computed N Total Actual N Power Total 0.802 73
2 sample t Power;
proc power;twosamplemeansmeandiff= .5stddev=1power=0.8ntotal=.;run;
Two-sample t Test for Mean Difference Fixed Scenario Elements Distribution Normal Method Exact Mean Difference 0.5 Standard Deviation 1 Nominal Power 0.8 Number of Sides 2 Null Difference 0 Alpha 0.05 Group 1 Weight 1 Group 2 Weight 1 Computed N Total Actual N Power Total 0.801 128
Calculate sample size
2 sample t Power • proc power;• twosamplemeans• meandiff = 5 [assumed
difference]• stddev =10 [assumed SD]• sides = 1 [1 tail]• ntotal = 50 [25 per group]• power = .; *[tell me!];• run;
The POWER Procedure Two-Sample t Test for Mean Difference Fixed Scenario Elements
Distribution NormalMethod ExactNumber of Sides 1Mean Difference 5Standard Deviation 10Total Sample Size 50Null Difference 0Alpha 0.05Group 1 Weight 1Group 2 Weight 1
Computed PowerPower0.539
Typical Power in Psych• Average effect size is about d=.40.• Consider power for effect sizes between
.3 and .6. What kind of sample size do we need for power of .8?proc power;twosamplemeansmeandiff= .3 to .6 by .1stddev=1power=.8ntotal=.;plot x= power min = .5 max=.95;run;
Two-sample t Test for 1 Computed N Total Mean Actual N Index Diff Power Total 1 0.3 0.801 352 2 0.4 0.804 200 3 0.5 0.801 128 4 0.6 0.804 90
Typical studies are underpowered.
Power Curves
0.5 0.6 0.7 0.8 0.9 1.0
Power
0
100
200
300
400
500
600
Tota
l Sam
ple
Size
Mean Diff 0.30.40.50.6
Why a whopper of an IV is helpful.
Review• About how many people total will you
need for power of .8, alpha is .05 (two tails), and an effect size of .3?
• You can only afford 40 people per group, and based on the literature, you estimate the group means to be 50 and 60 with a standard deviation within groups of 20. What is your power estimate?
top related