t-tests and chi2 does your sample data reflect the population from which it is drawn from?

35
T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

Upload: colin-atkinson

Post on 11-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

T-Tests and Chi2

Does your sample data reflect the population from which it is drawn

from?

Page 2: T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

Single Group Z and T-Tests

• The basic goal of these simple tests is to show that the distribution of the given data under examination are not produced by chance and that there is some systematic pattern therein.

• Main point is to show the mean of a sample is reflective of the population.

• Salkind’s text skips a discussion of single group/sample T-Tests.

Page 3: T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

Review of Z-Tests• Recall that a Z-score can measure the location of

a given value on a normal distribution, which can be expressed as a probability.

• A Z-Test uses the normal distribution to obtain a test statistic based on some data that can be compared with a sampling distribution of chance, which is an abstract construction drawn from the data.

• This is a parameter estimation, which is an inference of a sample based on a population of data.

Page 4: T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

Problem with Z Tests

• But because we do not often know the population variance, σ2, we estimate a single “point estimate” or value (sample mean).

• However, this sample mean may vary greatly from the real population mean, μ. This error is called “sampling error.”

Page 5: T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

Problem with Z Tests

• A confidence interval is set up to estimate μ. This is a range of values that is likely to include the value of the population mean (at the center of the interval). The larger the sample, the more the sample mean should equal the population mean, but there may be some error within the confidence interval. How far is the from μ ?X

Page 6: T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

Student’s T-Test• Problem: We may not know the mean and

variance of some populations, which means we cannot do a Z-Test. In this case, we use a T-test, Student’s T to be specific, for use with a single group or sample of data.

• Again, this is when we are not looking at different groups but a sample of data as an entirety. We will next examine differences in groups.

Page 7: T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

Student’s T-Test• One uses this test when the population

variance is unknown, as is usually the case in the social sciences.

• The standard error of the sampling distribution of the sample mean is estimated.

• A t distribution (not normal curve, more platykurtic but mean=0) is used to create confidence intervals, like critical values.

Page 8: T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

T Distribution

• Very similar to the Z distribution by assuming normality.

• Normality is obtained after about 100 data observations.

• Basic rule of parameter estimation: the higher the obs (N) of sample the more reflective of overall population.

Page 9: T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

The t formula

1

N

S

yt

y

y

)1/(2/ NStYCI y

For α =.05 and N=30 , t =2.045

Page 10: T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

95% CI using t-test

• Mean= 20

• Sy = 5

• N= 20

20± 2.086 (5/19) = 20.55 upper 19.45 lower

Page 11: T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?
Page 12: T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

T-TestsT-Tests

Independent Samples

Page 13: T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

T-Tests of Independence• Used to test whether there is a significant

difference between the means of two samples.

• We are testing for independence, meaning the two samples are related or not.

• This is a one-time test, not over time with multiple observations.

Page 14: T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

T-Test of Independence

• Useful in experiments where people are assigned to two groups, when there should be no differences, and then introduce Independent variables (treatment) to see if groups have real differences, which would be attributable to introduced X variable. This implies the samples are from different populations (with different μ).

• This is the Completely Randomized Two-Group Design.

Page 15: T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

For example, we can take a random set of independent voters who have not made up their minds about who to vote for in the 2004 election. But we have another suspicion:

H1: watching campaign commercials increases consumption of Twinkies (snackie cakes), or μ1≠ μ2

Null is μ1= μ2

After one group watches the commercials, but not the other, we measure Twinkie in-take. We find that indeed the group exposed to political commercials indeed ate more Twinkies. We thus conclude that political advertising leads to obesity.

Page 16: T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

21

21

2

21

22211

21

2

)1()1(

nn

nn

nn

snsn

XXt

Two Sample Difference of Means T-Test

2

21

22211

2

)1()1(

nn

snsnSp2 =

Pooled variance of the two groups

21

21

nn

nn = common standard deviation of two groups

Page 17: T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

Two Sample Difference of Means T-Test

• The nominator of the equation captures difference in means, while the denominator captures the variation within and between each group.

• Important point: of interest is the difference between the sample means, not sample and population means. However, rejecting the null means that the two groups under analysis have different population means.

Page 18: T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

An example

• Test on GRE verbal test scores by gender:

Females: mean = 50.9, variance = 47.553, n=6

Males: mean=41.5, variance= 49.544, n=10

)10(6

106

2106

544.49)110(553.47)16(

5.419.50t

Page 19: T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

)26667(.826.48

4.9t

02.13

4.9t

605.2608.3

4.9t Now what do we do with this

obtained value?

Page 20: T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

Steps of Testing and Significance

1. Statement of null hypothesis: if there is not one then how can you be wrong?

2. Set Alpha Level of Risk: .10, .05, .01

3. Selection of appropriate test statistic: T-test, chi2, regression, etc.

4. Computation of statistical value: get obtained value.

5. Compare obtained value to critical value: done for you for most methods in most statistical packages.

Page 21: T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

Steps of Testing and Significance

6. Comparison of the obtained and critical values.

7. If obtained value is more extreme than critical value, you may reject the null hypothesis. In other words, you have significant results.

8. If point seven above is not true, obtained is lower than critical, then null is not rejected.

Page 22: T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

The critical values are set by moving toward the tails of the distribution. The higher the significance threshold, the more space under the tail.

Also, hypothesis testing can entail a one or two-tailed test, depending on if a hypothesis is directional (increase/decrease) in nature.

Page 23: T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

Steps of Testing and Significance

• The curve represents all of the possible outcomes for a given hypothesis.

• In this manner we move from talking about a distribution of data to a distribution of potential values for a sample of data.

Page 24: T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

GRE Verbal Example

Obtained Value: 2.605Critical Value?Degrees of Freedom: number of cases left after

subtracting 1 for each sample.

Is the null hypothesis supported?

Answer: Indeed, women have higher verbal skills and this is statistically significant. This means that the mean scores of each gender as a population are different.

Page 25: T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

Let’s try another sample

• D:\POLS 5300 FA04\Comparing Means examples.xls

• Type in the data in SPSS

Page 26: T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

Paired T-Tests• We use Paired T-Tests, test of

dependence, to examine a single sample subjects/units under two conditions, such as pretest - posttest experiment.

• For example, we can examine whether a group of students improves if they retake the GRE exam. The T-test examines if there is any significant difference between the two studies. If so, then possibly something like studying more made a difference.

Page 27: T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

)1(

)( 22

n

DDn

D

ΣD = sum differences between groups, plus it is squared.

n = number of paired groups

Page 28: T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

Paired T-Tests

• Unlike a test for independence, this test requires that the two groups/samples being evaluated are dependent upon each other.

• For example, we can use a paired t-test to examine two sets of scores across time as long as they come from the same students.

• If you are doing more than two groups, use ANOVA.

Page 29: T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

Let’s Go to SPSS

• Using the data from last time, we will now analyze the Pre-test/Post-test data for GRE exams.

• D:\POLS 5300 FA04\Comparing Means examples.xls

Page 30: T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

Paired Samples Statistics

409.69 16 200.459 50.115

448.88 16 152.679 38.170

TESTSCR1

TESTSCR2

Pair1

Mean N Std. DeviationStd. Error

Mean

Paired Samples Correlations

16 .959 .000TESTSCR1 & TESTSCR2Pair 1N Correlation Sig.

Paired Samples Test

-39.19 69.155 17.289 -76.04 -2.34 -2.267 15 .039TESTSCR1 - TESTSCR2Pair 1Mean Std. Deviation

Std. ErrorMean Lower Upper

95% ConfidenceInterval of the

Difference

Paired Differences

t df Sig. (2-tailed)

H0: μ scr1 = μscr2 whereas research hypothesis H1: 12:1 scrscrH

Page 31: T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

Nonparametric Test of Chi2• Used when too many assumptions are violated in

T-Tests:– Sample size to small to reflect population– Data are not continuous and thus appropriate for

parametric tests based on normal distributions.

• Chi2 is another way of showing that some pattern in data is not created randomly by chance.

• Chi2 can be one or two dimensional.

Page 32: T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

Nonparametric Test of Chi2

• Again, the basic question is what you are observing in some given data created by chance or through some systematic process?

ecyectedfrequE

equencyobservedfrOE

EO

exp

)( 22

Page 33: T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

Nonparametric Test of Chi2

• The null hypothesis we are testing here is that the proportion of occurrences in each category are equal to each other. Our research hypothesis is that they are not equal.

Given the sample size, how many cases could we expect in each category (n/#categories)? The obtained/critical value estimation will provide a coefficient and a Pr. that the results are random.

Page 34: T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

Cross-Tabs and Chi2

• One often encounters chi2 with cross-tabulations, which are usually used descriptively but can be used to test hypotheses.

Page 35: T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

Party Affiliation * gender (past, present) Crosstabulation

Count

5 6 11

5 8 13

10 14 24

Dem

GOP

Party Affiliation

Total

Fem Male

gender (past, present)

Total

Chi-Square Tests

.120b 1 .729

.000 1 1.000

.120 1 .729

1.000 .527

.115 1 .735

24

Pearson Chi-Square

Continuity Correctiona

Likelihood Ratio

Fisher's Exact Test

Linear-by-LinearAssociation

N of Valid Cases

Value dfAsymp. Sig.

(2-sided)Exact Sig.(2-sided)

Exact Sig.(1-sided)

Computed only for a 2x2 tablea.

1 cells (25.0%) have expected count less than 5. The minimum expected count is4.58.

b.