review for final examination comm 550x, may 12, 11 am- 1pm final examination

Post on 13-Dec-2015

220 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Review for Final Examination

COMM 550X, May 12, 11 am- 1pm Final Examination

 

Practice for the Mid-Term

• Multiple choice portion of the test: There will be 50 multiple choice questions chosen at random from this pool of possible test questions. Each item will be worth 1 point

• SPSS DATA ANALYSIS: You will be tested in SPSS on bivariate correlation, multiple regression, and MANOVA/discriminant analysis. The questions will use the data sets statelevel.sav and NationsoftheWorldModified.sav. The questions will have point values as follows: bivariate correlation, 10 points; multiple regression, 18 points; MANOVA/discriminant analysis, 22 points

Sample Test Question for Bivariate Correlation (8 points) Using the NationsoftheWorldModified.sav

data set, test the hypothesis that that there is a significant positive association between a country’s civil liberties score and the annual number of peace demonstrations in that country. Set your confidence level at .05. Report the obtained value of the test statistic, the N, the df and probability level, and whether or not you can reject the null hypothesis of no association between the two variables.

Testing the Hypothesis You have been asked to see if there is a significant association

between two variables. For tests where both variables are interval level or better and no causal relationship between the two is implied, the appropriate test statistic to compute is the bivariate correlation. You are looking for a significant level of Pearson’s r, the correlation coefficient

In SPSS Data Editor, open the NationsoftheWorldModified.sav data file

Go to Analyze/Correlate/Bivariate and put the two variables, civil liberties score and number of peaceful political demonstrations, into the Variables window

Select a one-tailed test (you do this because you have made a prediction about the direction of the relationship, that it will be positive) and flag significant correlations

Under Correlation Coefficients select Pearson and click OK Compare your output to the next slide

SPSS Output for Bivariate Correlation You only get a small amount of output for bivariate correlation. Note the

correlation coefficient (.077), the sample size (N = 112) and the significance level (.208). DF is equal to N-2 for Pearson’s r.

Before you did the test, you set your confidence level to .05, so p (the probablility level) needed to be smaller than .05 for you to reject the null hypothesis. But your obtained value of Pearson’s r has a significance level of .208. Consequently, you cannot reject the null hypothesis, and you are not able to confirm your research hypothesis that there is a significant positive association between a country’s civil liberties score and the number of its peaceful political demonstrations

Pearson’s r

significance level

Correlations

1 .077

. .208

112 112

.077 1

.208 .

112 112

Pearson Correlation

Sig. (1-tailed)

N

Pearson Correlation

Sig. (1-tailed)

N

Civil liberties score

Number of peacefulpolitical demonstrations

Civil libertiesscore

Number ofpeacefulpolitical

demonstrations

Writing up your Result “Bivariate correlation analysis was performed to test

the hypothesis that a country’s civil liberties score was positively associated with its number of peaceful political demonstrations. The obtained value of Pearson’s r was .077 (N = 112, df = 110, p = .208, one-tailed test), which was not significant. Consequently, we cannot reject the null hypothesis that there is no association between a country’s civil liberties score and its number of peaceful political demonsrations, and our research hypothesis was not confirmed.” (Note: if the significance level had fallen below .05, then you would have confirmed your research hypothesis only if the sign of the association between the two variables was positive, as predicted, that is, if the obtained correlation coefficient was positive)

Sample Test Question for Multiple Regression You are asked to test the hypothesis that a country’s scores on the

civil liberties index is a function of a linear combination of three variables, (1) percentage of seats in the lower legislative house held by the largest party, (2) percentage in the work force who are women, and (3) percentage of voting age population who voted in the last election. You believe that these variables are of importance in the order listed above. Further, you expect that the signs of the first predictor, percentage of seats, will be negative, and the signs of the second two predictors will be positive. Test the hypothesis and then write an equation for predicting the score of a new case on the civil liberties index based on the three variables. Set your confidence level to .05. Report the test statistic, N, df, and obtained probability level, and all other statistics appropriate to determining whether or not you have used the procedure correctly, and state whether or not your data support rejecting the null hypothesis that civil liberties is unrelated to the three variables, and confirming your research hypothesis

Testing the Hypothesis To test this hypothesis, you need a

procedure which looks at the relationship between a single, interval or better level variable on the one hand and multiple interval level or better predictors on the other. This is multiple regression. Since your theory has given you a reason to order the importance of your predictors ahead of time, you choose a hierarchical regression analysis where you enter the variables into the regression equation in the order of their presumed importance.

SPSS Procedure for Multiple Regression Download the NationsoftheWorldModified.sav data file Go to Analyze/ Regression/ Linear

Move civil liberties score into the Dependent Box Now we are going to enter variables one at a time, in the order

predicted by our theory. Move your first to enter variable, percentage of seats in the lower legislative house held by the largest party, into the Independent box and click Next

Move your second to enter variable, percentage of the work force who are women, into the Independent box and click Next

Finally, move your third to enter variable, percentage of the voting age population who voted in the last election, into the Independent box. DON’T click next again

Make sure the enter option is selected under Method Under Statistics, select Estimates, Confidence Intervals, Model Fit, R squared

change, Descriptives, Part and Partial Correlation, and Collinearity Statistics, and click Continue.

Under Options, check Include Constant in the Equation, click Continue and then OK. You are doing this so you will be able to write the equation for predicting new cases’ civil liberties scores from raw scores on the predictor variables.

Compare results to next slides

SPSS Output: The Variables and their Order of Entry Look for this box to make sure you have done the

hierarchical regression form of multiple regression and that your variables have been entered in the order predicted by your theory

Variables Entered/Removedb

Percent ofseats inlower legishse heldby largestparty

a

. Enter

Percent oflabor forcewho arewomen

a. Enter

Percent ofvoting agepop whovoted inlastelection

a

. Enter

Model1

2

3

VariablesEntered

VariablesRemoved Method

All requested variables entered.a.

Dependent Variable: Civil liberties scoreb.

The Regression Model Summary Table Next, look for your model summary. Note that there are three models

examined, and the notes a, b, and c tell which of your predictors are in each model. Note that model 1, with only the percent of seats in the lower legislative house variable entered, was significant (F = 52.544, p <.001), and when the percentage of labor force who are women variable was added in model 2, the increase in R square, the percent of variance accounted for, was significant (F = 6.346, p < .014). Thus the two-variable model is significantly correlated with Y. Note that Model three didn’t change R square significantly (p = .471) (didn’t improve prediction significantly) so you really don’t need the third predictor, percent of voting age population who voted in last election. You choose Model 2

Model Summary

.625a .391 .383 1.277 .391 52.544 1 82 .000

.659b .435 .421 1.237 .044 6.346 1 81 .014

.662c .438 .417 1.240 .004 .525 1 80 .471

Model1

2

3

R R SquareAdjustedR Square

Std. Error ofthe Estimate

R SquareChange F Change df1 df2 Sig. F Change

Change Statistics

Predictors: (Constant), Percent of seats in lower legis hse held by largest partya.

Predictors: (Constant), Percent of seats in lower legis hse held by largest party, Percent of labor force who are womenb.

Predictors: (Constant), Percent of seats in lower legis hse held by largest party, Percent of labor force who are women,Percent of voting age pop who voted in last election

c.

Regression Statistics; R and R Square Note the statistics for the Model you have chosen, Model 2. The multiple correlation R between civil liberties

score and the two predictors is .659. The amount of variance in the civil liberties score accounted for by the combination of the two variables is .435

Model Summary

.625a .391 .383 1.277 .391 52.544 1 82 .000

.659b .435 .421 1.237 .044 6.346 1 81 .014

.662c .438 .417 1.240 .004 .525 1 80 .471

Model1

2

3

R R SquareAdjustedR Square

Std. Error ofthe Estimate

R SquareChange F Change df1 df2 Sig. F Change

Change Statistics

Predictors: (Constant), Percent of seats in lower legis hse held by largest partya.

Predictors: (Constant), Percent of seats in lower legis hse held by largest party, Percent of labor force who are womenb.

Predictors: (Constant), Percent of seats in lower legis hse held by largest party, Percent of labor force who are women,Percent of voting age pop who voted in last election

c.

Overall Significance of the Regression Equation Look in the ANOVA table to get the overall F value for the Model you have chosen (the F (2, 81) value for the two

variable combination of percent of seats held by largest party and percent of labor force who are women is 31.158, p < .001

ANOVAd

85.620 1 85.620 52.544 .000a

133.618 82 1.629

219.238 83

95.327 2 47.664 31.158 .000b

123.911 81 1.530

219.238 83

96.135 3 32.045 20.825 .000c

123.103 80 1.539

219.238 83

Regression

Residual

Total

Regression

Residual

Total

Regression

Residual

Total

Model1

2

3

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), Percent of seats in lower legis hse held by largest partya.

Predictors: (Constant), Percent of seats in lower legis hse held by largest party,Percent of labor force who are women

b.

Predictors: (Constant), Percent of seats in lower legis hse held by largest party,Percent of labor force who are women, Percent of voting age pop who voted in lastelection

c.

Dependent Variable: Civil liberties scored.

Standardized and Unstandardized Coefficients; Multicollinearity

Continue to examine your output. Note the standardized and unstandardized coefficients. You can use the unstandardized coefficients to write the regression equation Y = 6.194 -.039 percent of seats held by largest party + .034 percent of labor force who are women. You can use the standardized coefficients to compare the relative contributions of number of seats and percent of women (.-620 and .210, respectively) and note that both standardized coefficents were significantly different from zero. Note also that the sign of the standardized coefficient for percentage of seats was a minus sign, as predicted by your theory, and that the sign of the other variable was positive, as predicted. You can also report your tolerance and VIF statistics which suggest that multicollinearity was not a problem (tolerance is 1.0, VIF is not near 10)

Coefficientsa

7.298 .358 20.367 .000 6.585 8.011

-.040 .005 -.625 -7.249 .000 -.051 -.029 -.625 -.625 -.625 1.000 1.000

6.194 .559 11.078 .000 5.081 7.306

-.039 .005 -.620 -7.425 .000 -.050 -.029 -.625 -.636 -.620 1.000 1.000

.034 .014 .210 2.519 .014 .007 .061 .224 .270 .210 1.000 1.000

5.776 .805 7.179 .000 4.175 7.377

-.038 .006 -.594 -6.497 .000 -.049 -.026 -.625 -.588 -.544 .840 1.190

.032 .014 .195 2.263 .026 .004 .060 .224 .245 .190 .942 1.062

.006 .009 .068 .725 .471 -.011 .023 .347 .081 .061 .796 1.256

(Constant)

Percent of seats in lowerlegis hse held by largestparty

(Constant)

Percent of seats in lowerlegis hse held by largestparty

Percent of labor force whoare women

(Constant)

Percent of seats in lowerlegis hse held by largestparty

Percent of labor force whoare women

Percent of voting age popwho voted in last election

Model1

2

3

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Lower Bound Upper Bound

95% Confidence Interval for B

Zero-order Partial Part

Correlations

Tolerance VIF

Collinearity Statistics

Dependent Variable: Civil liberties scorea.

Writing up Your Multiple Regression Results

“To test the hypothesis that a country’s civil liberties score was significantly related to a linear combination of the number of seats in the lower legislative house held by the largest party, the number of women in the labor force, and the percentage of the voting age population who voted in the last election, a multiple regression analysis was conducted. It was expected that the variable ‘number of seats held by the largest party’ would be negatively correlated with civil liberties score and the other two variables positively related. Results of the regression analysis indicated that a two-variable model which included number of seats in the lower legislative house held by the largest party and percentage of women in the workplace was significantly correlated with civil liberties scores (F (2, 81) = 31.158, p < .001. Addition of the third variable to the predictive model did not significantly increase the amount of variance in civil liberties score (F = .525, p < .471). The two-variable combination accounted for approximately 43.5% of the variance in civil liberties score. (continued on next slide)

Writing up Your Multiple Regression Results, cont’d The best fitting regression equation for predicting civil

liberties score from the two variables was civil liberties score = 6.194 -.039 percent of seats held by largest party + .034 percent of labor force who are women. Significant standardized coefficients (βs) were obtained for the two variables (-.620 for percent of seats held by the largest party and .210 for percentage of women in the labor force), indicating that countries with higher scores on civil liberties would be likely to have a smaller percentage of seats in the lower legislative house held by the largest party and a larger percentage of women in the labor force, as predicted. Tolerance and VIF for the two-variable model were both equal to 1.0, indicating that multicollinearity was not an issue. Thus we can say that partial support for the hypothesis was obtained.”

Sample Test Question for Discriminant Analysis

Now we are going to test the following hypothesis: Southern and non-Southern states differ significantly on a combination of two types of traffic fatality: restrained and unrestrained motor vehicle accidents, such that Southern states will have a significantly higher value on the combined indicators than non-Southern states.

Testing the Hypothesis

Both discriminant analysis and MANOVA can be used in the case where you have two or more interval or better level predictors (DVs in the usage of MANOVA) and a nominal level grouping variable (IV in the usage of MANOVA). In this case we have a nominal level grouping variable (Southern/non-Southern) and interval level (actually ratio level) DVs or discriminating variables (traffic fatality variables).

We are going to use discriminant analysis to do the MANOVA, which (1) will give the identical result in the case where there are only two groups (two levels of the grouping variable) and (2) let us practice doing discriminant analysis and evaluating the efficacy of the discriminant function. We are going to be looking for a significant level of Wilks’ lambda as an indicator of significant differences and support for the hypothesis. It is also necessary for the signs of the discriminant function coefficients to be in the same direction as that predicted for the two variables (a positive relationship with “southerness”).

SPSS Procedure for Discriminant Analysis Download the file statelevel.sav.

In SPSS Data Editor, open the data file statelevel.sav Go to Analyze/Classify/Discriminant In the Group box put South (dummy) and set the maximum

and minimum values to 1 and 0, respectively In the Independents, put restrained motor vehicle deaths per

100k and unrestrained motor vehicle deaths per 100k Make sure that the Enter Independents Together button is

checked Under Statistics, check Means, univariate ANOVAs, Box’s M,

Unstandardized function coefficients, and click continue Under Classify, select Summary Table and Territorial Map, and

click Continue, and then OK Compare your output to the next few slides

Examining Your SPSS Output: Group Means First, look at the group means. Note that the means are in the

expected direction with levels of the two vehicle death variables higher in the South than in the non-South. Univariate F tests show that the differences are significant for both of the variables. So you have significant differences in the expected direction on both of your variables considered separately

Group Statistics

8.62437 2.785566 34 34.000

7.54033 4.380484 34 34.000

10.65369 2.182391 16 16.000

10.69006 4.332577 16 16.000

9.27376 2.756466 50 50.000

8.54824 4.568596 50 50.000

Restrained motorveh deaths per 100k

Unrestrained motorveh deaths per 100k

Restrained motorveh deaths per 100k

Unrestrained motorveh deaths per 100k

Restrained motorveh deaths per 100k

Unrestrained motorveh deaths per 100k

South dummyNon-south

South

Total

Mean Std. Deviation Unweighted Weighted

Valid N (listwise)

Tests of Equality of Group Means

.880 6.567 1 48 .014

.894 5.664 1 48 .021

Restrained motorveh deaths per 100k

Unrestrained motorveh deaths per 100k

Wilks'Lambda F df1 df2 Sig.

Box’s M Test for Equality of Group Covariances, and Significance of Wilk’s Lambda Overall Test Next, look at your Box’s M test for the equality of group covariances. Box’s M is not significant, which

means you have met one of the assumptions of MANOVA, that the group covariances for the levels of the grouping variable are equal. Now look at the value of Wilks’ lambda, and assess it for significance. Wilks’ lambda is significant by the Chi-square test, and it equals .783. If we interpret this significant value of Wilks’ lambda in a MANOVA-like way, we have confirmed the hypothesis that Southern and non-Southern states differ significantly on the combination of the two motor vehicle predictors. (If we were interpreting this in a discriminant analysis type of way, we would say that the combination of two types of traffic related fatalities left .783 of the variance in Southern state-ness “unexplained”). Wilks’ lambda is one of those measures you want to be close to zero, so this result is statistically significant, but not all that impressive

Test Results

1.912

.602

3

19070.702

.613

Box's M

Approx.

df1

df2

Sig.

F

Tests null hypothesis of equal population covariance matrices.

Wilks' Lambda

.783 11.478 2 .003Test of Function(s)1

Wilks'Lambda Chi-square df Sig.

The Canonical Correlation From your printout you will also want to report the canonical correlation between the

combination of the two traffic fatality variables and South/Non-South, which is. 465. This represents the correlation of the grouping variable (South/non-South) with the new canonical variable formed by weighting the two original predictors (traffic fatalities belted and unbelted) by the weights from the discriminant function. You don’t usually report the equation for classifying new cases in the write-up when you are using MANOVA or discriminant analysis to test a hypothesis about group differences

Eigenvalues

.277a 100.0 100.0 .465Function1

Eigenvalue % of Variance Cumulative %CanonicalCorrelation

First 1 canonical discriminant functions were used in theanalysis.

a.

Canonical Discriminant Function Coefficients

.291

.163

-4.093

Restrained motorveh deaths per 100k

Unrestrained motorveh deaths per 100k

(Constant)

1

Function

Unstandardized coefficients

You would use these weights to classify new cases as to south/non-South

Discriminant Function Coefficients, Group Means on Functions You would report the standardized discriminant function coefficients to

show the relative contribution of each of the two predictors, which in this case are about equal, and both positively associated with the discriminant function, as required for support of your hypothesis. Then would you report the group means (centroids) on the discriminant function which shows that the South is highly positively correlated with it (e.g., being a Southern state is highly correlated with higher vehicle deaths) and the non-south is negative correlated with it.

Standardized Canonical Discriminant Function Coefficients

.760

.713

Restrained motorveh deaths per 100k

Unrestrained motorveh deaths per 100k

1

Function

Functions at Group Centroids

-.354

.751

South dummyNon-south

South

1

Function

Unstandardized canonical discriminantfunctions evaluated at group means

Classification Results Finally, you would report the re-classification results (that is, the results of

using the discriminant function coefficients to create a new, canonical variable out of the old predictors and use this new variable to re-classify cases as to South or non-South) and the most frequently occurring misclassifications; e.g., 78% of the cases were correctly re-classified based on the discriminant function. Slightly more errors proportionally were made re-classifying the Southern than the non-Southern cases

Classification Resultsa

27 7 34

4 12 16

79.4 20.6 100.0

25.0 75.0 100.0

South dummyNon-south

South

Non-south

South

Count

%

OriginalNon-south South

Predicted GroupMembership

Total

78.0% of original grouped cases correctly classified.a.

Writing up your Discriminant Analysis Result “A discriminant analysis was conducted to perform a multivariate-

analysis of variance test of the hypothesis that Southern states differ from non-Southern states on a linear combination of two types of traffic fatality, restrained motor vehicle accidents and unrestrained motor vehicle accidents, such that Southern states will have a significantly higher value on the combined indicators than non-Southern states. The obtained value of Wilks’ lambda, .783, was significant at p <.003 (Chi-square = 11.478, df = 2, Box’s M =1.912, n.s.). The canonical correlation between the grouping variable and the new canonical variable composed of the two predictors was .465. Significant univariate differences of means between Southern and non-Southern states were also obtained for restrained motor vehicle accidents (F (1, 48) = 6.567, p <.014) and unrestrained vehicle accidents (F (1, 48) = 5.664, p < .021). Mean differences were in the expected direction: means for restrained motor vehicle accidents were 10.65 for Southern states and 8.62 for non-Southern states; means for unrestrained motor vehicle accidents were 10.69 for Southern states and 7.54 for non-Southern states.

Writing up Your Discriminant Analysis Result, cont’d Table 1 presents the standardized discriminant

function coefficients. Higher scores on the discriminant function corresponded to higher traffic fatality rates for both of the discriminating variables. Table 2 presents the group centroids on the discriminant function; the Southern states group had a high, positive centroid with respect to the function, corresponding to higher rates of traffic fatalities. Table 3 presents the results of the re-classification analysis, which shows that the discriminant function was successful in reclassifying 78% of the cases.”

top related