sociology 601: midterm review, october 15, 2009 basic information for the midterm –date: tuesday...

33
Sociology 601: Midterm review, October 15, 2009 Basic information for the midterm – Date: Tuesday October 20, 2009 Start time: 2 pm. Place: usual classroom, Art/Sociology 3221 Bring a sheet of notes, a calculator, two pens or pencils Notify me if you anticipate any timing problems Review for midterm – terms – symbols steps in a significance test testing differences in groups contingency tables and measures of association – equations 1

Post on 20-Dec-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Sociology 601: Midterm review, October 15, 2009

• Basic information for the midterm– Date: Tuesday October 20, 2009– Start time: 2 pm.– Place: usual classroom, Art/Sociology 3221– Bring a sheet of notes, a calculator, two pens or pencils– Notify me if you anticipate any timing problems

• Review for midterm– terms– symbols– steps in a significance test– testing differences in groups– contingency tables and measures of association– equations

1

Important terms from chapter 1

Terms for statistical inference:• population• sample• parameter• statistic

Key idea: You use a sample to make inferences about a population

2

Important terms from chapter 2

2.1) Measurement:• variable• interval scale• ordinal scale• nominal scale• discrete variable• continuous variable

2.2-2.4) Sampling:• simple random sample• probability sampling• stratified sampling• cluster sampling• multistage sampling• sampling error

Key idea: Statistical inferences depend on measurement and sampling.3

Important terms from chapter 3

3.1) Tabular and graphic description• frequency distribution• relative frequency distribution• histogram• bar graph

3.2-3.4) Measures of central tendency and variation• mean• median• mode• proportion• standard deviation• variance• interquartile range• quartile, quintile, percentile

4

Important terms from chapter 3

Key ideas:

1.) Statistical inferences are often made about a measure of central tendency.

2.) Measures of variation help us estimate certainty about an inference.

5

Important terms from Chapter 4

• probability distribution• sampling distribution • sample distribution• normal distribution• standard error• central limit theorem• z-score

Key ideas:

1.) If we know what the population is like, we can predict what a sample might be like.

2.) A sample statistic gives us a best guess of the population parameter.

2.) If we work carefully, a sample can tell us how confident to be about our sample statistic.

6

Important terms from chapter 5

• point estimator

• estimate

• unbiased

• efficient

• confidence interval

Key ideas:

1.) We have a standard set of equations we use to make estimates.

2.) These equations are used because they have specific desirable properties.

3.) A confidence interval provides your best guess of a parameter.

4.) A confidence interval provides your best guess of how close your best guess (in part 3.)) will typically be to the parameter.

7

Important terms from chapter 6

6.1 – 6.3) Statistical inference: Significance tests

• assumptions• hypothesis• test statistic• p-value• conclusion• null hypothesis• one-sided test• two-sided test• z-statistic

8

Key Idea from chapter 6

A significance test is a ritualized way to ask about a population parameter.

1.) Clearly state assumptions

2.) Hypothesize a value for a population parameter

3.) Calculate a sample statistic.

4.) Estimate how unlikely it is for the hypothesized population to produce such a sample statistic.

5.) Decide whether the hypothesis can be thrown out.

9

More important terms from chapter 6

6.4, 6.7) Decisions and types of errors in hypothesis tests

• type I error

• type II error

• power

6.5-6.6) Small sample tests

• t-statistic

• binomial distribution

• binomial test

Key ideas:

1.) Modeling decisions and population characteristics can affect the probability of a mistaken inference.

2.) Small sample tests have the same principles as large sample tests, but require different assumptions and techniques.

10

symbols

a

YY

i

HH

dfnzt

ss

PYY

000

ˆ

22 ˆˆ

ˆˆ

11

Significance tests, Step 1: assumptions

• An assumption that the sample was drawn at random.– this is pretty much a universal assumption for all significance

tests.

• An assumption whether the variable has two outcome categories (proportion) or many intervals (mean).

• An assumption that enables us to assume a normal sampling distribution. This is assumption varies from test to test. – Some tests assume a normal population distribution.– Other tests assume different minimum sample sizes.– Some tests do not make this assumption.

• Declare α level at the start, if you use one. 12

Significance Tests, Step 2: Hypothesis

• State the hypothesis as a null hypothesis.– Remember that the null hypothesis is about the

population from which you draw your sample.

• Write the equation for the null hypothesis.

• The null hypothesis can imply a one- or two-sided test.– Be sure the statement and equation are consistent.

13

Significance Tests, Step 3: Test statistic

For the test statistic, write:• the equation, • your work, and • the answer.

– Full disclosure maximizes partial credit.

– I recommend four significant digits at each computational step, but present three as the answer.

14

Significance tests, Step 4: p-value

Calculate an appropriate p-value for the test-statistic.

– Use the correct table for the type of test;

– Use the correct degrees of freedom if applicable;

– Use a correct p-value for a one- or two-sided test, as you declared in the hypothesis step.

15

Significance Tests, Step 5: Conclusion

Write a conclusion

– write the p-value, your decision to reject H0 or not;

– a statement of what your decision means;

– discuss the substantive importance of your sample statistic.

16

Useful STATA outputs

• immediate test for sample mean using TTESTI:

. * for example, in A&F problem 6.8, n=100 Ybar=508 sd=100 and mu0=500

. ttesti 100 508 100 500, level(95)

One-sample t test

------------------------------------------------------------------------------

| Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

---------+--------------------------------------------------------------------

x | 100 508 10 100 488.1578 527.8422

------------------------------------------------------------------------------

Degrees of freedom: 99

Ho: mean(x) = 500

Ha: mean < 500 Ha: mean != 500 Ha: mean > 500 t = 0.8000 t = 0.8000 t = 0.8000 P < t = 0.7872 P > |t| = 0.4256 P > t = 0.212820

Useful STATA outputs

• immediate test for sample proportion using PRTESTI:

• . * for proportion: in A&F problem 6.12, n=832 p=.53 and p0=.5• . prtesti 832 .53 .50, level(95)

• One-sample test of proportion x: Number of obs = 832

• ------------------------------------------------------------------------------• Variable | Mean Std. Err. [95% Conf. Interval]• -------------+----------------------------------------------------------------• x | .53 .0173032 .4960864 .5639136• ------------------------------------------------------------------------------

• Ho: proportion(x) = .5

• Ha: x < .5 Ha: x != .5 Ha: x > .5• z = 1.731 z = 1.731 z = 1.731• P < z = 0.9582 P > |z| = 0.0835 P > z = 0.0418

21

Useful STATA outputs

• Comparison of two means using ttesti•

• ttesti 4252 18.1 12.9 6764 32.6 18.2, unequal

• Two-sample t test with unequal variances

• ------------------------------------------------------------------------------• | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]• ---------+--------------------------------------------------------------------• x | 4252 18.1 .1978304 12.9 17.71215 18.48785• y | 6764 32.6 .221294 18.2 32.16619 33.03381• ---------+--------------------------------------------------------------------• combined | 11016 27.00323 .1697512 17.8166 26.67049 27.33597• ---------+--------------------------------------------------------------------• diff | -14.5 .2968297 -15.08184 -13.91816• ------------------------------------------------------------------------------• Satterthwaite's degrees of freedom: 10858.6

• Ho: mean(x) - mean(y) = diff = 0

• Ha: diff < 0 Ha: diff != 0 Ha: diff > 0• t = -48.8496 t = -48.8496 t = -48.8496• P < t = 0.0000 P > |t| = 0.0000 P > t = 1.0000

24

Chapter 6: Significance Tests for Single Sample

or sample size best test

mean large z-test for Ybar - 0

proportion large z-test for hat - 1

mean small t-test for Ybar - 0

proportion small Fisher’s exact test

32

Equations for tests of statistical significance

z =Y − μ0

ˆ σ Y

33

z =ˆ π − π 0

σ ˆ π

t =Y − μ0

ˆ σ Y

Chapter 7: Comparing scores for two groups

or sample size sample scheme best test

mean large independent z-test for 2 - 1

proportion large independent z-test for 2 - 1

mean small independent t-test for 2 - 1

proportion small independent Fisher’s exact test

mean large dependent z-test for D

proportion large dependent McNemar test

mean small dependent t-test for D

proportion small dependent Binomial test34

Two Independent Groups: Large Samples, Means

7.1. difference of two large sample means : z =Y 2 −Y 1( ) − 0

s12

n1

+s2

2

n2

• It is important to be able to recognize the parts of the equation, what they mean, and why they are used.

• Equal variance assumption? NO

35

Two Independent Groups: Large Samples, Proportions

7.2 difference of 2 large sample proportions : z =ˆ π 2 − ˆ π 1( ) − 0

ˆ π (1− ˆ π )

n1

+ˆ π (1− ˆ π )

n2

• Equal variance assumption? YES (if proportions are equal then so are variances).

• df = N1 + N2 - 2

36

Two Independent Groups: Small Samples, Means

t(or z) =(Y 2 −Y 1) − 0

ˆ σ Y 2 −Y 1

=(Y 2 −Y 1)

(n1 −1)s12 + (n2 −1)s2

2

n1 + n2 − 2*

1

n1

+1

n2

7.3 Difference of two small sample means:

Equal variance assumption: SOMETIMES (for ease)

NO (in computer programs)37

Two Independent Groups: Small Samples, Proportions

Fisher’s exact test • via stata, SAS, or SPSS• calculates exact probability of all possible

occurences

38

Dependent Samples:

• Means:

• Proportions:

t(or z) =D ˆ σ

D

=D

sD

n

39

z =n12 − n21

n12 + n21

Chapter 8: Analyzing associations

• Contingency tables and their terminologies:– marginal distributions and joint distributions– conditional distribution of R, given a value of E.

(as counts or percentages in A & F)– marginal, joint, and conditional probabilities.

(as proportions in A & F)

• “Are two variables statistically independent?”

40

Descriptive statistics you need to know

• How to draw and interpret contingency tables (crosstabs)

• Frequency and probability/ percentage terms– marginal – conditional– joint

• Measures of relationships: – odds, odds ratios– gamma and tau-b

41

Observed and expected cell counts

• fo, the observed cell count, is the number of cases in a given cell.

• fe, the expected cell count, is the number of cases we would predict in a cell if the variables were independent of each other.

• fe = row total * column total / N

– the equation for fe is a correction for rows or columns with small totals.

42

Chi-squared test of independence

• Assumptions: 2 categorical variables, random sampling, fe >= 5

• Ho: variables are statistically independent (crudely, the score for one variable is independent of the score for the other.)

• Test statistic: 2 = ((fo-fe)2/fe)

• p-value from 2 table, df = (r-1)(c-1)

• Conclusion; reject or do not reject based on p-value and prior -level, if necessary. Then, describe your conclusion.

43

Probabilities, odds, and odds ratios.

• Given a probability, you can calculate an odds and a log odds.– odds = p / (1-p)

• 50/50 = 1.0

• 0 ∞– log odds = log (p / (1-p) ) = log (p) – log(1-p)

• 50/50 = 0.0• -∞ +∞

– odds ratio = [ p1 / (1-p1) ] / [ p2 / (1-p2) ]

• Given an odds, you can calculate a probability.p = odds / ( 1 + odds)

44

Measures of association with ordinal data

• concordant observations C: – in a pair, one is higher on both x and y

• discordant observations D:– in a pair, one is higher on x and lower on y

• ties– in a pair, same on x or same on y

• gamma (ignores ties)

• tau-b is a gamma that adjusts for “ties”– gamma often increases with more collapsed tables b and both have standard errors in computer output b can be interpreted as a correlation coefficient

=C − D

C + D

45