1 a review of widely-used statistical methods widely-used statistical methods

71
1 A Review of A Review of Widely-Used Widely-Used Statistical Methods Statistical Methods

Upload: dwain-norton

Post on 22-Dec-2015

225 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

1

A Review ofA Review of

Widely-Used Widely-Used Statistical MethodsStatistical Methods

Page 2: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

2

REVIEW OF FUNDAMENTALSREVIEW OF FUNDAMENTALS

When testing hypotheses,all statistical methods willalways be testing the null.

Null Hypothesis?

No difference/no relationship

If we If we do not rejectdo not reject the null, conclusion? the null, conclusion?Found no difference/no relationship

If we do decide to If we do decide to rejectreject the null, conclusion? the null, conclusion?A significant relationship/difference is found and reported

o The observed relationship/difference is too large to be attributable to chance/sampling error.

Page 3: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

3

How do we decide to reject/not reject the null?How do we decide to reject/not reject the null?

Statistical tests of significance Statistical tests of significance always test the nullalways test the null and and always reportalways report

(sig. level)—probability of erroneously rejecting a true null based on sample data.

represents the odds of being wrong if we decide to reject the null

the probability that null is in fact true and that any apparent relationship/difference is a result of chance/sampling error and, thus

the odds of being wrong if we report a significant relationship/difference.

Rule of thumb for deciding to reject/not reject the Rule of thumb for deciding to reject/not reject the null? null?

Page 4: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

4

COMMON TYPES OF ANALYSIS?COMMON TYPES OF ANALYSIS?

–Examine Strength and Direction of Relationships•BivariateBivariate (e.g., Pearson Correlation—r)

Between one variable and another: Y = a + b1 x1

•MultivariateMultivariate (e.g., Multiple Regression Analysis)Between one dep. var. and an independent variable, while holding all other independent variables constant:

Y = a + b1 x1 + b2 x2 + b3 x3 + … + bk xk

–Compare Groups•Between ProportionsProportions (e.g., Chi Square Test—2) H0: P1 = P2 = P3 = … = Pk

•Between MeansMeans (e.g., Analysis of Variance) H0: µ1 = µ2 = µ3 = …= µk

Let’s first review some fundamentals.

STATITICAL DATA ANALYSISSTATITICAL DATA ANALYSIS

Page 5: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

5

INDEPENDENTINDEPENDENT NOMINAL/CATEGORICAL NOMINAL/CATEGORICAL METRIC (ORDERED METRIC or METRIC (ORDERED METRIC or HIGHER) HIGHER)

* Chi-Square * Discriminant Analysis

* Fisher’s Exact Prob. * Logit Regression

* T-Test * Correlation (and Covariance) Analysis

* Analysis of Variance * Regression Analysis

Remember: Level of measurement determines choice of Remember: Level of measurement determines choice of statistical method.statistical method.

Statistical Techniques and Levels of Measurement:

DEP

EN

DEN

T

DEP

EN

DEN

T

NNOOMMIINNAALL

MMEETTRRIICC

Page 6: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

CorrelationCorrelation and CovarianceCovariance: Measures of Association Between Two Variables

Often we are interested in the Often we are interested in the strength and nature of the strength and nature of the relationshiprelationship between two variables. between two variables.Often we are interested in the Often we are interested in the strength and nature of the strength and nature of the relationshiprelationship between two variables. between two variables.

Two indicesTwo indices that measure the linear relationship between two that measure the linear relationship between two continuous/metriccontinuous/metric variables are: variables are:a.a.CovarianceCovariance b.b.Correlation Coefficient Correlation Coefficient (Pearson Correlation)(Pearson Correlation)

Two indicesTwo indices that measure the linear relationship between two that measure the linear relationship between two continuous/metriccontinuous/metric variables are: variables are:a.a.CovarianceCovariance b.b.Correlation Coefficient Correlation Coefficient (Pearson Correlation)(Pearson Correlation)

Page 7: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

CovarianceCovariance

Covariance Covariance (for a sample) is computed as follows:(for a sample) is computed as follows:

Covariance Covariance (for a sample) is computed as follows:(for a sample) is computed as follows:

forforsamplessamples

sx x y y

nxyi i

( )( )

1s

x x y ynxy

i i

( )( )

1

Positive valuesPositive values indicate a positive relationship. indicate a positive relationship. Positive valuesPositive values indicate a positive relationship. indicate a positive relationship.

Negative values Negative values indicate a negative (inverse) relationshipindicate a negative (inverse) relationship.. Negative values Negative values indicate a negative (inverse) relationshipindicate a negative (inverse) relationship..

Covariance Covariance is a measure of the is a measure of the linear associationlinear association between between two metric variablestwo metric variables (i.e., ordered metric, interval, or ratio (i.e., ordered metric, interval, or ratio variables). variables).

Covariance Covariance is a measure of the is a measure of the linear associationlinear association between between two metric variablestwo metric variables (i.e., ordered metric, interval, or ratio (i.e., ordered metric, interval, or ratio variables). variables).

Page 8: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

A golf enthusiast is interested in investigating A golf enthusiast is interested in investigating the relationshipthe relationship, if any, between golfers’ , if any, between golfers’ driving distancedriving distance ( (xx) and their ) and their 18-hole score18-hole score ( (yy). ). He uses the following He uses the following sample datasample data (i.e., data (i.e., data from n = 6 golfers) to examine the issue:from n = 6 golfers) to examine the issue:

277.6277.6259.5259.5269.1269.1267.0267.0255.6255.6272.9272.9

696971717070707071716969

x x =Average Driving=Average DrivingDistance Distance (yards.)(yards.)

yy = Golfer’s Average = Golfer’s Average18-Hole Score18-Hole Score

CovarianceCovariance (s(sxyxy ) of Two Variables ) of Two Variables

Example: Golfing StudyExample: Golfing Study

Page 9: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

Covariance (sCovariance (sxyxy ) of two variables ) of two variables

277.6277.6259.5259.5269.1269.1267.0267.0255.6255.6272.9272.9

696971717070707071716969

xx yy

10.6510.65 -7.45-7.45 2.152.15 0.050.05-11.35-11.35 5.955.95

-1.0-1.0 1.01.0 00 00 1.01.0-1.0-1.0

-10.65-10.65 -7.45-7.45 00 00-11.35-11.35 -5.95-5.95

( )ix x( )ix x ( )( )i ix x y y ( )( )i ix x y y ( )iy y( )iy y

AverageAverageStd. Dev.Std. Dev.

267.0267.0 70.070.0 -35.40-35.408.21928.2192.8944.8944

TotalTotal

Example: Golfing StudyExample: Golfing Study sx x y y

nxyi i

( )( )

1s

x x y ynxy

i i

( )( )

1

n = 6

Page 10: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

( )( ) 35.40 7.08

1 6 1i i

xy

x x y ys

n

( )( ) 35.40

7.081 6 1

i ixy

x x y ys

n

Covariance:Covariance:

CovarianceCovariance Example: Golfing StudyExample: Golfing Study

• What can we sayWhat can we say about the relationship between the two variables? about the relationship between the two variables?

The relationship is negative/inverse. The relationship is negative/inverse. That is, the longer a golfer’s driving distance is, the lower (better) That is, the longer a golfer’s driving distance is, the lower (better)

his/her score is likely to be.his/her score is likely to be.

• How strongHow strong is the relationship between is the relationship between xx and and yy??

Hard to tellHard to tell; there is ; there is no standard metricno standard metric to judge it by! to judge it by! Values of covariance Values of covariance depend on units of measurement for x and y.depend on units of measurement for x and y.

WHAT DOES THIS MEAN? WHAT DOES THIS MEAN?

Page 11: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

SOLUTION: Correlation Coefficient comes to the rescue!SOLUTION: Correlation Coefficient comes to the rescue!

•Correlation Coefficient (r) Correlation Coefficient (r) is is a standard measure/metrica standard measure/metric for for judging strength of linear relationship that, judging strength of linear relationship that, unlike covarianceunlike covariance,,is not affected by the units of measurement for is not affected by the units of measurement for xx and and yy. .

This is why correlation coefficient (r) is much more widely This is why correlation coefficient (r) is much more widely used that covariance.used that covariance.

It means:It means:If driving distance (If driving distance (xx) were ) were measured in measured in feetfeet, rather than , rather than yardsyards, , even though it is the same relationship (using the even though it is the same relationship (using the same datasame data),),the covariancethe covariance s sxyxy would have been would have been much largermuch larger. WHY?. WHY?

Because Because xx-values would be much larger, and thus -values would be much larger, and thus values will be much larger which, in turn, will makevalues will be much larger which, in turn, will make much larger.much larger.

( )( ) 35.40 7.08

1 6 1i i

xy

x x y ys

n

( )( ) 35.40

7.081 6 1

i ixy

x x y ys

n

CovarianceCovariance

(

)( xxi

))(( yyxx ii

Page 12: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

Correlation CoefficientCorrelation Coefficient

Correlation Coefficient rCorrelation Coefficient rxyxy (Pearson/simple correlation)(Pearson/simple correlation) is a measure of linear association between two variables. is a measure of linear association between two variables.It It may or may notmay or may not represent causation. represent causation.

Correlation Coefficient rCorrelation Coefficient rxyxy (Pearson/simple correlation)(Pearson/simple correlation) is a measure of linear association between two variables. is a measure of linear association between two variables.It It may or may notmay or may not represent causation. represent causation.

The The correlation coefficientcorrelation coefficient rrxyxy (for sample data) is (for sample data) is computed as follows: computed as follows: The The correlation coefficientcorrelation coefficient rrxyxy (for sample data) is (for sample data) is computed as follows: computed as follows:

forforsamplessamples

sxy = Covariance of x & ysx = Std. Dev. of xsy = Std Dev. of yyx

xyxy ss

sr

Page 13: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

13

In1888, plotted lengths of forearms and head sizes to see to what degree one could be predicted by the other.

Stumbled upon the mathematical properties of correlation plots (e.g., y intercept, size of slope, etc.).

RESULT: An objective measure of how two variables are“co-related“----CORRELATION COEFFICIENT (Pearson CORRELATION COEFFICIENT (Pearson Correlation), rCorrelation), r. Assesses the strength of a relationship based strictly on

empirical data, and independent of human judgment or opinion

Correlation Coefficient = rCorrelation Coefficient = rFrancis Galton(English researcher, inventor of fingerprinting, and cousin of Charles Darwin)

Page 14: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

14

To examine:

a. Whether a relationship exists between two metric variables

•e.g., income and education, or workload and job satisfaction and

b. What the nature and strength of that relationship may be.

Range of Values for r?

Correlation Coefficient Correlation Coefficient (Pearson Correlation) = r = r

What do you use it for?What do you use it for? Karl Pearson, a Galton Student & the Founder of Modern Statistics

Page 15: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

Correlation Coefficient (Pearson Correlation) rCorrelation Coefficient (Pearson Correlation) rxyxy

• •r-values closer to r-values closer to -1-1 or or +1+1 indicate stronger linear relationships. indicate stronger linear relationships.•r-values r-values closer to zerocloser to zero indicate a indicate a weakerweaker relationship. relationship.

• •r-values closer to r-values closer to -1-1 or or +1+1 indicate stronger linear relationships. indicate stronger linear relationships.•r-values r-values closer to zerocloser to zero indicate a indicate a weakerweaker relationship. relationship.

NOTE: Once rOnce rxyxy is calculated, we need to see is calculated, we need to see whether it iswhether it is

statistically significantstatistically significant (if using sample data). (if using sample data).

•Null Hypothesis when using r?Null Hypothesis when using r?H0: r = 0

There is no relationship between the two variables.

NOTE: Once rOnce rxyxy is calculated, we need to see is calculated, we need to see whether it iswhether it is

statistically significantstatistically significant (if using sample data). (if using sample data).

•Null Hypothesis when using r?Null Hypothesis when using r?H0: r = 0

There is no relationship between the two variables.

-1 -1 << r r << +1. +1. -1 -1 << r r << +1. +1.

Page 16: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

16

Page 17: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

277.6277.6259.5259.5269.1269.1267.0267.0255.6255.6272.9272.9

696971717070707071716969

x x =Average Driving=Average DrivingDistance (yards.)Distance (yards.)

yy =Average =Average18-Hole Score18-Hole Score

Correlation Coefficient (Pearson Correlation Coefficient (Pearson Correlation) rCorrelation) rxyxy

Example: Golfing StudyExample: Golfing Study

A golf enthusiast is interested in investigating A golf enthusiast is interested in investigating the relationshipthe relationship, if any, between golfers’ , if any, between golfers’ driving distancedriving distance ( (xx) and their ) and their 18-hole score18-hole score ( (yy). ). He uses the following He uses the following sample datasample data (i.e., data (i.e., data from n = 6 golfers) to examine the issue:from n = 6 golfers) to examine the issue:

Page 18: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

277.6277.6259.5259.5269.1269.1267.0267.0255.6255.6272.9272.9

696971717070707071716969

xx yy

10.6510.65 -7.45-7.45 2.152.15 0.050.05-11.35-11.35 5.955.95

-1.0-1.0 1.01.0 00 00 1.01.0-1.0-1.0

-10.65-10.65 -7.45-7.45 00 00-11.35-11.35 -5.95-5.95

( )ix x( )ix x ( )( )i ix x y y ( )( )i ix x y y ( )iy y( )iy y

AverageAverageStd. Dev.Std. Dev.

267.0267.0 70.070.0 -35.40-35.408.21928.2192.8944.8944

TotalTotal

Example: Golfing StudyExample: Golfing Study

Correlation Coefficient (Pearson Correlation Coefficient (Pearson Correlation) rCorrelation) rxyxy

Page 19: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

We had calculated sample Covariance sCovariance sxyxy to be:

Correlation CoefficientCorrelation Coefficient (Pearson Correlation) rxy 7.08

-.9631(8.2192)(.8944)

xyxy

x y

sr

s s

7.08

-.9631(8.2192)(.8944)

xyxy

x y

sr

s s

( )( ) 35.40 7.08

1 6 1i i

xy

x x y ys

n

( )( ) 35.40

7.081 6 1

i ixy

x x y ys

n

Example: Golfing StudyExample: Golfing Study

Conclusion?Not only is the relationship negative, but also extremely Not only is the relationship negative, but also extremely

strong!strong!

Correlation Coefficient (Pearson Correlation Coefficient (Pearson Correlation) rCorrelation) rxyxy

Page 20: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

Correlation Coefficient (Pearson Correlation Coefficient (Pearson Correlation):Correlation):

To understand the practical meaning of r, we can square it.

• What would r2 mean/represent? • e.g., r = 0.96 r2 = 92%

20

)).(()()(

))((r

22yx

xy

ss

s

yyxx

yyxx

• r2 always represents a %

• Why do we show more interest in r, rather than r2?

r2 Represents the proportion (%) of the total/combined variation in both x and y that is accounted for by the joint variation (covariation) of x and y together (x with y and y with x)

Page 21: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

21

r2 = (Covariation of X and Y together) / (All of variation of X & Y combined)

Blood

Age Pressure _ _ _ _ _ _

X Y X – X Y – Y (X – X) (Y – Y) (X – X)2 (Y – Y)2

4 12 -3 -4 12 9 16

6 19 -1 3 -3 1 9

9 14 2 -2 -4 4 4

. . . . . . .

. . . . . . .

_ _ _ _ _ _

X=7 Y=16 ∑(X – X) (Y – Y) ∑ (X – X)2 ∑ (Y – Y)2

Correlation Coefficient: ComputationCorrelation Coefficient: Computation

NOTE: Once r is calculated, we need to see if it is statisticallyNOTE: Once r is calculated, we need to see if it is statistically significant (if sample data). That is, we need to test H significant (if sample data). That is, we need to test H00: r = 0 : r = 0

)).(()()(

))((r

22yx

xy

ss

s

yyxx

yyxx

Page 22: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

22

Suppose the correlation between X (say, Students’ GMAT Scores) and Y (their 1st year GPA in MBA program) is r = +0.48r = +0.48 andis statistically significant. How would we interpret this?

Correlation Coefficient?Correlation Coefficient?

a) GMAT score and 1st year GPA are positively related so that as values of one variable increase, values of the other also tend to increase, and

b) R2 = (0.48)2 = 23% of variations/differences in students’ GPAs are explained by (or can be attributed to) variations/ differences in their GMAT scores.

Lets now practice on SPSS

Menu Bar: Analyze, Correlate, Bivariate, Pearson

EXAMPLE: Using data in SPSS File Salary.sav we wish to see if beginning salary is related to seniority, age, work experience, and education

Page 23: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

23

COMMON TYPES OF ANALYSIS:COMMON TYPES OF ANALYSIS:

–Examine Strength and Direction of Relationships•Bivariate (e.g., Pearson Correlation—r)

Between one variable and another: Y = a + b1 x1

•Multivariate (e.g., Multiple Regression Analysis)Between one dep. var. and an independent variable, while holding all other independent variables constant:

Y = a + b1 x1 + b2 x2 + b3 x3 + … + bk xk

–Compare Groups•Between Proportions (e.g., Chi Square Test—2) H0: P1 = P2 = P3 = … = Pk

•Between Means (e.g., Analysis of Variance) H0: µ1 = µ2 = µ3 = …= µk

STATITICAL DATA ANALYSISSTATITICAL DATA ANALYSIS

Page 24: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

Developed by Karl Pearson in 1900.

Is used to compare two or more groups regarding a categorical characteristic.

That is, to compare proportions/percentages:– Examines whether proportions of different groups of subjects

(e.g., managers vs professionals vs operatives) are equal/ different across two or more categories (e.g., males vs females).

Examines whether or not a relationship exists betweentwo categorical/nominal variables (e.g., employee status and gender)– A categorical DV and a categorical IV.– EXAMPLE?

Is smoking a function of gender? That is, is there a difference between the percentages of males and females who smoke?

24

STATITICAL DATA ANALYSISSTATITICAL DATA ANALYSIS

Chi-Square Test of Independence?Chi-Square Test of Independence?

Page 25: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

25

•Chi-Square Test of Chi-Square Test of IndependenceIndependence

Research Sample (n=100):ID Gender Smoking Status1 0 = Male 1 = Smoker2 1 = Female 0 = Non-Smoker3 1 14 1 05 0 0. . .. . .. . .100 1 0

Dependent variable (smoking status) and the independent variable (gender) are both categorical.

Null Hypothesis?Null Hypothesis?HH00: : There is no difference in the percentagesThere is no difference in the percentages of males and females of males and females who smoke/don’t smoke (i.e., who smoke/don’t smoke (i.e., Smoking is not a function of gender).Smoking is not a function of gender).

QUESTION: Logically, what would be the first thing you would do?QUESTION: Logically, what would be the first thing you would do?

Page 26: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

26

•Chi-Square Test of Chi-Square Test of IndependenceIndependence

H0: There is no difference in the percentages of males and females who smoke (Smoking is not a function of gender).

H1: The two groups are different with respect to the proportions who smoke.

TESTING PROCEDURE AND THE INTUITIVE LOGIC:

Construct a contingency Table: Cross-tabulate the observations and compute ObservedObserved (actual) Frequencies Frequencies (O(Oijij ) )::

Male Female TOTAL

Smoker O11 = 15 O12 = 25 40

Nonsmoker O21 = 5 O22 = 55 60

TOTAL 20 80 n = 100

Page 27: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

27

Chi-Square Test of Chi-Square Test of IndependenceIndependence

Next, ask yourself: What numbers would you expect to find in the table if you were certain that there was absolutely no difference between the percentages of males and females who smoked (i.e., if you expected the Null to be true)? That is, compute the ExpectedExpected Frequencies ( Frequencies (EEijij ). ).

Hint:

Male Female TOTAL

Smoker O11 = 15 O12 = 25 40

Nonsmoker O21 = 5 O22 = 55 60

TOTAL 20 80 n = 100

What % of all the subjects are smokers/non-smokers?

Page 28: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

28

Chi-Square Test of Chi-Square Test of IndependenceIndependence

If there were absolutely no differences between the two groups with regard to smoking, you would expect 40% of individuals in each group to be smokers (and 60% non-smokers).

Compute and place the Expected Frequencies (Expected Frequencies (EEijij ) ) in the appropriate cells:

Male Female TOTAL

Smoker O11 = 15 O12 = 25 40

Nonsmoker O21 = 5 O22 = 55 60

TOTAL 20 80 n = 100

NOW WHAT? What is the next logical step?NOW WHAT? What is the next logical step?

E11 = 8 E12 = 32

E21 = 12 E22 = 48

Page 29: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

29

Chi-Square Test of Chi-Square Test of IndependenceIndependence

Male Female TOTAL

Smoker O11 = 15 O12 = 25 40

Nonsmoker O21 = 5 O22 = 55 60

TOTAL 20 80 n = 100

E11 = 8 E12 = 32

E21 = 12 E22 = 48

Compare the Observed and Expected frequencies—i.e., examine the (Oij – Eij) discrepancies.

QUESTION: What can we infer if the observed/actual frequencies happen to be reasonably close (or identical) to the expected frequencies?

Page 30: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

30

•Chi-Square Test of Chi-Square Test of IndependenceIndependence

– Reasonably certain that no difference exists between percentages of males and females who smoke,

– Good chance that H0 is true • That is, we would be running a large risk of being wrong if we

decide to reject it.

On the other hand, the farther apart the observed frequencies happen to be from their corresponding expected frequencies:– The greater the chance that percentages of males and females who smoke

would be different,– Good chance that H0 is false and should be rejected

• That is, we would run a relatively small risk of being wrong if we decide to reject it.

What is, then, the next logical step?

So, the keySo, the key to answering our original question to answering our original question lies in the lies in the size of the size of the discrepanciesdiscrepancies between observed and expected frequencies. between observed and expected frequencies.

If the observed frequencies were reasonably close to the expected frequencies:

Page 31: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

31

Chi-Square Test of Chi-Square Test of IndependenceIndependence

Positive and negative values of (Oij – Eij) RESIDUALS for different cells

will cancel out.

• Solution?

Square each (Oij – Eij) and then sum them up--compute (Oij – Eij)2.

• Any Other Problems?

Value of (Oij – Eij)2 is impacted by sample size (n).

– For example, if you double the number of subjects in each cell, even though cell

discrepancies remain proportionally the same, the above discrepancy index will

be much larger and may lead to a different conclusion. Solution?

Compute an Overall Discrepancy IndexOverall Discrepancy Index: One way to quantify the total discrepancy between observed (Oij) and expected (Eij) frequencies is to add up all cell discrepancies--i.e., compute (Oij – Eij).

• Problem?

Page 32: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

32

(Oij – Eij)2

2 = You have just developed the formula for 2 Statistic:

Eij

22 can be can be intuitivelyintuitively viewed as: viewed as:An An indexindex that shows that shows how much the observed frequencieshow much the observed frequencies are are in agreement in agreement with (or apart from) the expected frequencieswith (or apart from) the expected frequencies (for when the null is assumed (for when the null is assumed to be true).to be true).

So, let’s compute 2 statistic for our example:

• Divide each (Oij – Eij)2 value by its corresponding Eij value before summing them up across all cells• That is, compute an index for average discrepancy per subject.

(Oij – Eij)2

Eij

Chi-Square Test of Chi-Square Test of IndependenceIndependence

Page 33: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

33

(15 – 8)2 (25 – 32)2 (5 – 12)2 (55 – 48)2

2 = + + + = 12.76 8 32 12 48

Chi-Square Test of Chi-Square Test of IndependenceIndependence

Male Female TOTAL

Smoker O11 = 15 O12 = 25 40

Nonsmoker O21 = 5 O22 = 55 60

TOTAL 20 80 n = 100

E11 = 8 E12 = 32

E21 = 12 E22 = 48

Page 34: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

34

•Chi-Square Test of Chi-Square Test of IndependenceIndependence

– Observed frequencies are in close agreement with what we would expect them to be if there were no differences between our comparison groups.

– That is, there is a strong likelihood that no difference exists between the percentages of males and females who smoke.

– Hence, we would be running a significant risk of being wrong if we were to reject the null hypothesis. That is, is expected to be relatively large.

• Therefore, we should NOT reject the null.

› NOTE: Smaller2 values result in larger levels (if n remains the same).

A large 2 value means?

Let’s Review: Obtaining a small 2 value means?

Page 35: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

35

A large 2 value means:– Observed frequencies are far apart from what they ought to be if the null hypothesis were true.

– That is, there is a strong likelihood for existence of a difference in the percentages of male and female smokers.

– Hence, we would be running a small risk of being wrong if we were to reject the null hypothesis. That is, is likely to be small.

•Thus, we should reject the null.

› NOTE: larger2 values result in smaller levels (if n remains the same).

But, how large is large?But, how large is large?For example, does 2 = 12.76 represent a large enough departure (of observed frequencies) from expected frequencies to warrant rejecting the null? Check out the associated level!reflects whether reflects whether 2 is large enough to warrant rejecting the null. is large enough to warrant rejecting the null.

Chi-Square Test of Chi-Square Test of IndependenceIndependence

Page 36: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

36

•Chi-Square Test of Chi-Square Test of IndependenceIndependence

Answer:– Consult the table of probability distribution for 2 statistic to see

what the actual value of actual value of is (i.e., what is the probability that our 2 value is not large enough to be considered significant).

– That is, look up the level associated with your 2 value (under appropriate degrees of freedom).

• Degrees of Freedom: df = (r-1) (c-1)

df = (2 – 1) (2 – 1) = 1

where r and c are # of rows and columns of the contingency table.

Page 37: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

37

Page 38: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

38

•Chi-Square Test of Chi-Square Test of IndependenceIndependence

• Smaller than 0.001• Therefore, If we reject the null, the odds of being wrong will be even

smaller than 1 in 1000.

Can we afford to reject the null? Is it safe to do so?

CONCLUSIONCONCLUSION??

From the table, the level for 2 = 10.83 (with df = 1) is 0.001 .

Our 2 = 12.76 > 10.83 QUESTION: for ourQUESTION: for our 2 = 12.76 will be smaller or greater than 0.001?

–% of males and females who smoke are not equal.–That is, smoking is a function of gender.–Can we be more specific?Can we be more specific?

»Percentage of males who smoke is significantly larger than that of the females (75% vs. 31%, respectively)

• CAUTION: Select the appropriate percentages to report (Row% vs. Column%CAUTION: Select the appropriate percentages to report (Row% vs. Column%)

Page 39: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

39

Chi-Square Test of Chi-Square Test of IndependenceIndependence

Male Female TOTAL

Smoker O11 = 15 O12 = 25 40

Nonsmoker O21 = 5 O22 = 55 60

TOTAL 20 80 n = 100

15 / 20 = 75% 25 / 80 = 31%

Phi (a non-parametric correlation for categorical data):

Φ = χ2 / N = 12.76 / 100 = 0.357 (Note: sign is NA)

Page 40: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

40

Chi-Square Test of Chi-Square Test of IndependenceIndependence

VIOLATION OF ASSUMTIONS:

2 test requires expected frequencies (Eij) to be reasonably large. If this requirement is violated, the test may not be applicable.

SOLUTION:

– For 2 x 2 contingency tables (df = 1), use the Fisher’s Exact Probability Test results (automatically reported by SPSS).That is, look up of the Fisher’s exact test to arrive at your conclusion.

– For larger tables (df > 1), eliminate small cells by combining their corresponding categories in a meaningful way.That is, recode the variable that is causing small cells into a

new variable with fewer categories and then use this new variable to redo the Chi-Square test.

Page 41: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

41

Let’s now use SPSS to do the same analysis!

•Chi-Square Test of Chi-Square Test of IndependenceIndependence

Menu Bar: Analyze, Descriptive Statistics, Crosstabs

Statistics: Chi-Square, Contingency Coefficient.

Cells: Observed, Row/Column percentages (for the independent variable)

SPSS File: smoker

SPSS File: GSS93 Subset

Page 42: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

42

Suppose we wish to examine the validity of the “gender gap hypothesis” for the 1992 presidential election between Bill Clinton, George Bush, and Ross Perot.

SPSS File: Voter

Chi-Square Test of IndependenceChi-Square Test of Independence

Page 43: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

43

To examine whether a relationship exists between two metric variables (e.g., income and education, or workload and job satisfaction) and what the nature and strength of that relationship may be.

Range of Values for r?

Correlation Coefficient Correlation Coefficient (Pearson Correlation) = r = r

What do you use it for?What do you use it for?

r = 0 (There is no relationship between the two variables.)

-1 < r < +1

Null Hypothesis when using r?

Karl Pearson, a Galton Student & the Founder of Modern Statistics

Page 44: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

44

Page 45: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

45

r2 Represents the proportion (%) of the total/combined variation in both x and y that is accounted for by the joint variation (covariation) of x and y together (x with y and y with x)

• How is it calculated?r2 = (Covariation of X and Y together) / (Total variation of X & Y combined)

How do we measure/quantify variations?

]1/)(][1/)([

1/))((r

22

2

2

nyynxx

nyyxx

22

)()(

))((r

yyxx

yyxx

Correlation Coefficient:Correlation Coefficient:To understand the practical meaning of r, we can square it.

• What would r2 mean/represent?

• r2 always represents a %

• Why do we show more interest in r , rather than r2?

Page 46: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

46

r2 = (Covariation of X and Y together) / (All of variation of X & Y combined)

_ _ _ _ _ _

X Y X – X Y – Y (X – X) (Y – Y) (X – X)2 (Y – Y)2

4 12 -3 -4 12 9 16

6 19 -1 3 -3 1 9

9 14 2 -2 -4 4 4

. . . . . . .

. . . . . . .

. . . . . . .

_ _ _ _ _ _

X=7 Y=16 ∑(X – X) (Y – Y) ∑ (X – X)2 ∑ (Y – Y)2

22 )()(

))((r

yyxx

yyxx

Correlation Coefficient: ComputationCorrelation Coefficient: Computation

NOTE: Once r is calculated, we need to see if it is statisticallyNOTE: Once r is calculated, we need to see if it is statistically significant (if sample data). That is, we need to test H significant (if sample data). That is, we need to test H00: r = 0 : r = 0

Page 47: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

47

Suppose the correlation between X (say, Students’ GMAT Scores) and Y (their 1st year GPA in MBA program) is r = +0.48 and is statistically significant. How would we interpret this?

Correlation Coefficient?Correlation Coefficient?

a) GMAT score and 1st year GPA are positively related so that as values of one variable increase, values of the other also tend to increase, and

b) 23% of variations/differences in students’ GPAs are explained by (or can be attributed to) variations/ differences in their GMAT scores.

Lets now practice on SPSS

Menu Bar: Analyze, Correlate, Bivariate, Pearson

Using data in SPSS File Salary.sav we wish to see if beginning salary is related to seniority, age, work experience, and education

Page 48: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

48

COMMON TYPES OF ANALYSIS:COMMON TYPES OF ANALYSIS:

– Examine Strength and Direction of Relationships•Bivariate (e.g., Pearson Correlation—r)

Between one variable and another: Y = a + b1 x1

•Multivariate (e.g., Multiple Regression Analysis) Between one dep. var. and an independent variable, while

holding all other independent variables constant:

Y = a + b1 x1 + b2 x2 + b3 x3 + … + bk xk

– Compare Groups•Proportions (e.g., Chi Square Test—2)•Means (e.g., Analysis of Variance)

STATITICAL DATA ANALYSISSTATITICAL DATA ANALYSIS

Page 49: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

49

STATITICAL DATA ANALYSISSTATITICAL DATA ANALYSIS

To examine whether proportions of different groups of subjects (e.g., managers vs operatives) are equal/different across two or more categories (e.g., males vs females).

To examine whether or not a relationship exists between two categorical/nominal variables (e.g., employee status and gender)--categorical dependent variable, categorical independent variable.– EXAMPLE?

Is smoking a function of gender? That is, is there a difference between the percentages of males and females who smoke?

Chi-Square Test of Independence?Chi-Square Test of Independence?

Page 50: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

50

•Chi-Square Test of Chi-Square Test of IndependenceIndependence

Research Sample: ID Gender Smoking Status

1 0 = Male 1 = Smoker2 1 = Female 0 = Non-Smoker3 1 14 1 05 0 0. . .. . .. . .100 1 0

dependent variable (smoking status) and the independent variable (gender) are both categorical.

Null Hypothesis?Null Hypothesis?HH00: There is no difference in the percentages of males and females: There is no difference in the percentages of males and females who smoke (Smoking is not a function of gender).who smoke (Smoking is not a function of gender).

QUESTION: Logically, what would be the first thing you would do?QUESTION: Logically, what would be the first thing you would do?

Page 51: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

51

•Chi-Square Test of Chi-Square Test of IndependenceIndependence

H0: There is no difference in the percentages of males and females who smoke (Smoking is not a function of gender).

H1: The two groups are different with respect to the proportions who smoke.

TESTING PROCEDURE AND THE INTUITIVE LOGIC:

Construct a contingency Table: Cross-tabulate the observations and compute Observed (actual) Frequencies (Oij ) :

Male Female TOTAL

Smoker O11 = 15 O12 = 25 40

Nonsmoker O21 = 5 O22 = 55 60

TOTAL 20 80 n = 100

Page 52: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

52

Chi-Square Test of Chi-Square Test of IndependenceIndependence

Next, ask yourself: What numbers would you expect to find in the table if you were certain that there was absolutely no difference between the percentages of males and females who smoked? That is, compute the Expected Frequencies (Eij ).

Hint:

Male Female TOTAL

Smoker O11 = 15 O12 = 25 40

Nonsmoker O21 = 5 O22 = 55 60

TOTAL 20 80 n = 100

What % of all the subjects are smokers/non-smokers?

Page 53: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

53

Chi-Square Test of Chi-Square Test of IndependenceIndependence

If there were absolutely no differences between the two groups with regard to smoking, you would expect 40% of individuals in each group to be smokers (and 60% non-smokers).

Compute and place the Expected Frequencies (Eij ) in the appropriate cells:

Male Female TOTAL

Smoker O11 = 15 O12 = 25 40

Nonsmoker O21 = 5 O22 = 55 60

TOTAL 20 80 n = 100

NOW WHAT? What is the next logical step?NOW WHAT? What is the next logical step?

E11 = 8 E12 = 32

E21 = 12 E22 = 48

Page 54: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

54

Chi-Square Test of Chi-Square Test of IndependenceIndependence

Male Female TOTAL

Smoker O11 = 15 O12 = 25 40

Nonsmoker O21 = 5 O22 = 55 60

TOTAL 20 80 n = 100

E11 = 8 E12 = 32

E21 = 12 E22 = 48

Compare the Observed and Expected frequencies—i.e., examine the (Oij – Eij) discrepancies.

QUESTION: What can we infer if the observed frequencies happen to be reasonably close (or identical) to the expected frequencies?

Page 55: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

55

•Chi-Square Test of Chi-Square Test of IndependenceIndependence

– Reasonably certain that no difference exists between percentages of males and females who smoke,

– Good chance that H0 is true • That is, we would be running a large risk of being wrong if

we decide to reject it.

On the other hand, the farther apart the observed frequencies happen to be from their corresponding expected frequencies:– The greater the chance that percentages of males and females who smoke

would be different,– Good chance that H0 is false and should be rejected

• That is, we would run a relatively small risk of being wrong if we decide to reject it.

So, the key to answering our original question lies in So, the key to answering our original question lies in the size of the the size of the discrepanciesdiscrepancies between observed and expected frequencies. between observed and expected frequencies. What is, then, the next logical step?What is, then, the next logical step?

If the observed frequencies were reasonably close to the expected frequencies:

Page 56: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

56

Chi-Square Test of Chi-Square Test of IndependenceIndependence

Positive and negative values of (Oij – Eij) RESIDUALS for different cells will

cancel out.

• Solution?

Square each (Oij – Eij) and then sum them up--compute (Oij – Eij)2.

• Any Other Problems?

Value of (Oij – Eij)2 is impacted by sample size (n).

– For example, if you double the number of subjects in each cell, even though cell

discrepancies remain proportionally the same, the above discrepancy index will

be much larger and may lead to a different conclusion. Solution?

Compute an Overall Discrepancy Index: To quantify the overall discrepancy between observed (Oij). and expected (Eij). frequencies, we can add up all our cell discrepancies--i.e., compute (Oij – Eij).

• Problem?

Page 57: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

57

(Oij – Eij)2

2 = You have just developed the formula for 2 Statistic:

Eij

22 can be can be intuitivelyintuitively viewed as an viewed as an indexindex that shows that shows how much the how much the observed frequenciesobserved frequencies are are in agreement with (or apart from) the in agreement with (or apart from) the expected frequenciesexpected frequencies (when the null is assumed to be true). (when the null is assumed to be true).

So, let’s compute 2 statistic for our example:

• Divide each (Oij – Eij)2 value by its corresponding Eij value before summing them up across all cells• That is, compute the total discrepancy per subject index.

(Oij – Eij)2

Eij

Chi-Square Test of Chi-Square Test of IndependenceIndependence

Page 58: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

58

(15 – 8)2 (25 – 32)2 (5 – 12)2 (55 – 48)2

2 = + + + = 12.76 8 32 12 48

Chi-Square Test of Chi-Square Test of IndependenceIndependence

Male Female TOTAL

Smoker O11 = 15 O12 = 25 40

Nonsmoker O21 = 5 O22 = 55 60

TOTAL 20 80 n = 100

E11 = 8 E12 = 32

E21 = 12 E22 = 48

Page 59: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

59

•Chi-Square Test of Chi-Square Test of IndependenceIndependence

– Observed frequencies are in close agreement with what we would expect them to be if there were no differences between our comparison groups.

– That is, there is a strong likelihood that no difference exists between the percentages of males and females who smoke.

– Hence, we would be running a significant risk of being wrong if we were to reject the null hypothesis. That is, is expected to be relatively large.

• Therefore, we should NOT reject the null.

A large 2 value means?

Let’s Review: Obtaining a small 2 value means?

Page 60: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

60

A large 2 value means:– Observed frequencies are far apart from what they ought to be if the null hypothesis were true

– That is, there is a strong likelihood for existence of a difference in the percentages of male and female smokers.

– Hence, we would be running a small risk of being wrong if we were to reject the null hypothesis. That is, is likely to be small.

•Thus, we should reject the null.

But, how large is large?But, how large is large?For example, does 2 = 12.76 represent a large enough departure (of observed frequencies) from expected frequencies to warrant rejecting the null?

Chi-Square Test of Chi-Square Test of IndependenceIndependence

Page 61: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

61

•Chi-Square Test of Chi-Square Test of IndependenceIndependence

Answer:– Consult the table of probability distribution for 2 statistic to

see what the actual value of actual value of is (i.e., what is the probability that it is not large enough to be considered significant).

– That is, look up the level associated with your 2 value (under appropriate degrees of freedom).

• Degrees of Freedom: df = (r-1) (c-1)

df = (2 – 1) (2 – 1) = 1

where r and c are # of rows and columns of the contingency table.

Page 62: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

62

Page 63: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

63

•Chi-Square Test of Chi-Square Test of IndependenceIndependence

• Smaller than 0.001• Therefore, If we reject the null, the odds of being wrong will be even

smaller than 1 in 1000.

Can we afford to reject the null? Is it safe to do so?

CONCLUSIONCONCLUSION??

From the table, the level for 2 = 10.83 (with df = 1) is 0.001 .

Our 2 = 12.76 > 10.83 QUESTION:QUESTION: If we decide to reject the Null, will be smaller or greater than 0.001?

–% of males and females who smoke are not equal.–That is, smoking is a function of gender.–Can we be more specific?Can we be more specific?

»Percentage of males who smoke is significantly larger than that of the females (75% vs. 31%, respectively)

• CAUTION: Select the appropriate percentages to report (Row% vs. Column%)

Page 64: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

64

Chi-Square Test of Chi-Square Test of IndependenceIndependence

Male Female TOTAL

Smoker O11 = 15 O12 = 25 40

Nonsmoker O21 = 5 O22 = 55 60

TOTAL 20 80 n = 100

15 / 20 = 75% 25 / 80 = %31

Phi (a non-parametric correlation for categorical data):

Φ = χ2 / N

Page 65: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

65

Chi-Square Test of Chi-Square Test of IndependenceIndependence

VIOLATION OF ASSUMTIONS:

2 test requires expected frequencies (Eij) to be reasonably large.

If this requirement is violated, the test may not be applicable.

SOLUTION:

– For 2 x 2 contingency tables (df = 1), use the Fisher’s Exact

Probability Test results (automatically reported by SPSS).

• That is, look up of the Fisher’s exact test

– For larger tables (df > 1), eliminate small cells by combining

their corresponding categories in a meaningful way.

Page 66: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

66

Let’s now use SPSS to do the same analysis!

•Chi-Square Test of Chi-Square Test of IndependenceIndependence

Menu Bar: Analyze, Descriptive Statistics, Crosstabs

Statistics: Chi-Square, Contingency Coefficient.

Cells: Observed, Row/Column percentages (for the independent variable)

SPSS File: smoker

SPSS File: GSS93 Subset

Page 67: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

67

Suppose we wish to examine the validity of the “gender gap hypothesis” for the 1992-93 presidential elections between Bill Clinton, George Bush, and Ross Perot.

SPSS File: Voter

Chi-Square Test of IndependenceChi-Square Test of Independence

Page 68: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

68

1. As a population demographer, you have long suspected that women’s fertility rate in different countries (fertility--average number of children born to a woman) would be related to male and female literacy rates (lit-male and lit_fema), access to health care, as characterized by number of hospital beds per 10,000 people (hospbed) and number of doctors per 10,000 people(docs), infant mortality rate (babymort--number of deaths per 1,000 live births), as well as male and female life expectancies(lifeexpm and lifeexpf). The data file “World95.sav” (in the U drive) contains 1995 population and socio-economic statistics from 109 different countries, including statistics on all of the above-mentioned variables. Please use the data to test your suspicion (i.e., see fertility rate is correlated with which of the above variables and in what way).

Assignment #3Assignment #3

Page 69: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

69

2. Suppose you are a medical researcher and you wish to examine whether there is a relationship between incidents of coronary heart disease (CHD) and family history of CHD. Specifically, if CHD has in part a genetic component. In other words, you wish to know if incidents of CHD is proportionally higher among men with a family history of CHD, than among men without such family history. Researchers in the Western Electric Study have collected data on such issues using a sample of 240 men, ½ with and ½ without prior incidents of CHD (variable chd). All these men are by now deceased. Included in the data set is information on whether or not each subject has had a history of CHD in his immediate family (variable famhxcvr), as well as the day of the week when the subject’s death has occurred (variable dayofwk). The data is available in the electric.sav SPSS data file. (Note: Due to SPSS site license restrictions, this hyperlink will not work if you are off campus).

a) Please conduct the appropriate statistical test to address the above research objective. b) Suppose you have been noticing that for most people earlier working days of the week

(especially Mondays, Tuesdays, and Wednesday) appear to be more stressful in comparison with Fridays, Saturdays, and Sundays. Also, suppose you have come across prior research that indicates stress and CHD tend to create a deadly combination. As such, you have recently begun to suspect (hypothesize) that a larger percentage of men with CHD, compared with those without CHD, tend to die during Mondays, Tuesdays, and Wednesdays (as opposed to Fridays, Saturdays, and Sundays). That is, you suspect that having CHD increases likelihood of dying during the more stressful days of the week. Please perform the necessary analysis to verify the validity of your suspicion (plausibility of your hypothesis).

Assignment #3Assignment #3

Page 70: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

70

NOTE:If you examine the value labels for the variable daysofwk, you will see that it is coded as 1=Sunday, 2=Monday, 3=Tuesday, 4=Wednesday, 5=Thursday, 6=Friday, and 7=Saturday. Therefore, for part (b), you will need to create a new variable--i.e., Recode daysofwk into a new dichotomous variable (say, deathday), that would represent death during Mondays, Tuesdays, and Wednesdays vs. Fridays, Saturdays and Sundays. Notice that the subjects who died on Thursdays should not be included in the analysis (i.e., should not be represented in any of the two categories of days represented by the new variable) Also, make sure you properly define the attributes (e.g., label, value label, etc.) of this new variable (i.e., deathday).

REMINDERS:For each analysis, include the Notes in the printout. Also, edit the first page of your first analysis output to include your name. Make sure that on your printout you explain your findings and conclusions. Be specific as to what parts of the output you have used, and how you have used them, to reach your conclusions.

Make sure that you tell the whole story and that your explanations of the findings are complete. For example, it is not enough to say that there is a significant relationship between characteristic A and characteristic B. You have to go on to indicate how the two characteristics are related and what that relationship really means.

Assignment #3Assignment #3

Page 71: 1 A Review of Widely-Used Statistical Methods Widely-Used Statistical Methods

71

HYPOTHESIS TESTINGHYPOTHESIS TESTING

QUESTIONS OR COMMENTS

?