1 matters arising 1.summary of last weeks lecture 2.the exercises 3.your queries

71
1 Matters arising 1. Summary of last week’s lecture 2. The exercises 3. Your queries

Upload: jose-mcallister

Post on 28-Mar-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

1

Matters arising

1. Summary of last week’s lecture

2. The exercises

3. Your queries

Page 2: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

2

The Pearson correlation (r)

The PEARSON CORRELATION is a measure of a supposed linear association between two variables.

Page 3: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

3

Linear, but imperfect association

• If the scatterplot is elliptical in shape, a linear association is indicated.

• In psychology, all measurement is subject to random error.

• No association between measured variables is ever perfect.

• That is why the points do not all lie on a straight line.

Page 4: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

4

The Pearson correlation

Sums of squares

Sum of products

Page 5: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

Explanation• The numerator of r is

known as a SUM OF PRODUCTS (SP).

• It is the sum of products that captures the extent to which X and Y are associated, or CO-VARY.

• The sums of squares in the denominator merely constrain the range of variation of r.

Page 6: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

The sum of products captures covariation

• Points in the upper right quadrant have positive deviation products; points in the lower left also have positive deviation products (a minus times a minus is a plus).

• Points in the other two quadrants have negative products.

• Since the positive products predominate, we can expect the covariance to be very large.

• The negative products are small: the points are near the intersection of the mean lines.

Mean Preference score

Mean Actual Violence score

Page 7: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

7

An elliptical scatterplot

• This is fine. • The elliptical

scatterplot indicates that there is indeed a basically linear relationship between variable Y1 and variable X1.

Page 8: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

8

No association

• There is NO association between Z and Y.

• The high value of r is driven solely by the presence of a single OUTLIER.

Page 9: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

9

Anscombe’s rule

• When you examine a scatterplot (something you should ALWAYS do when interpreting a correlation), ask yourself the following question:

• “Would the removal of one or two points at

random affect the basically ellipical shape of the scatterplot? If the shape would remain essentially the same, the value of r accurately reflects the association between the variables”.

Page 10: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

10

Summary

• The Pearson correlation r is a measure of the strength of a SUPPOSED linear relationship between 2 variables.

• It is one of the most widely used of statistical measures; but it is also one of the most misused.

• You should always try to see the scatterplot when interpreting a value of r.

Page 11: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

11

Exercise

From the Violence data, obtain a scatterplot and calculate the Pearson

correlation.

Page 12: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

12

Direction of causation

• When we measure and obtaining the correlation between two variables we nearly always do so because we believe that one variable X causes or influences the other Y.

• We have measured Exposure X and Violence Y because we have the hypothesis that X causes Y.

Page 13: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

13

The scatterplot of Y against X

• If we believe that X causes Y, we want to “PLOT Y AGAINST X ”.

• We want a scatterplot with Y on the vertical axis and X on the horizontal axis.

Richard

John

Jim

Page 14: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

14

Ordering the plot

Page 15: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

15

The default graph

Page 16: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

16

The vertical scale

• Notice that the vertical axis begins at 3, rather than at zero.

• I like to see the whole scale on the vertical axis. • Double-click on the graph to enter the Chart

Editor. • Double-click on the vertical axis to enter a dialog

which will enable you to control the amount of the vertical scale that you can see.

Page 17: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

17

Ordering the full Y scale

Uncheck Auto and enter zero into the Custom slot.

Page 18: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

18

Final version

Page 19: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

19

Why do I like to see the entire scale on the vertical axis?

Page 20: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

20

Beware!

• Modern computing packages such as SPSS afford a bewildering variety of attractive graphs and displays to help you bring out the most important features of your results. You should certainly use them.

• But there are pitfalls awaiting the unwary.

Page 21: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

21

Performance profiles

• We often want to see how mean performance varies (or not) over various treatment conditions.

• We might want to compare the performance of participants who have ingested different kinds (or dosages) of drugs with that of a comparison or control group.

• There is a set of methods known as Analysis of Variance (ANOVA) which enable us to do that.

Page 22: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

22

Ordering a means plot

Page 23: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

23

A picture of the results

Page 24: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

24

The picture is false!

• The table of means shows miniscule differences among the five group means!

• The graph suggested that there were vast differences among the means!

Page 25: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

25

A small scale view

• Only a microscopically small section of the scale is shown on the vertical axis.

• This greatly magnifies even small differences among the group means.

Page 26: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

26

Putting things right

• Double-click on the image to get into the Graph Editor.

• Double-click on the vertical axis to access the scale specifications.

Click here

Page 27: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

27

Putting things right …

• Uncheck the minimum value box and enter zero as the desired minimum point.

• Click Apply.

Amend entry

Page 28: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

28

The true picture!

Page 29: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

29

The true picture …

• The effect is dramatic. • The profile now

reflects the true situation.

• ALWAYS BE SUSPICIOUS OF GRAPHS THAT DO NOT SHOW THE COMPLETE VERTICAL SCALE!

Page 30: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

30

Your queries

• Several of you have e-mailed me asking how you fit a line graph to a scatterplot.

• Last week, I said that an elliptical scatterplot indicated that the relationship between the variables was basically LINEAR.

• So we want the best-fitting straight line through the points.

• This is known as the REGRESSION LINE.

Page 31: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

31

Drawing the regression line through the points

Choose Fit Line at Total.

To leave the Chart Editor, choose Close from the Edit menu or double-click on the Viewer outside the rectangle around the figure.

Page 32: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

32

Finding the value of r

Page 33: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

33

Hypothesis testing

• In HYPOTHESIS TESTING, a proposition known as the NULL HYPOTHESIS (H0) is set up.

• H0 is the NEGATION of your scientific hypothesis.

• So if our scientific hypothesis is that there is an association, H0 says there’s NO association.

Page 34: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

34

The p-value

• To test H0, we gather our data and calculate the value of a TEST STATISTIC.

• If the null hypothesis is true, how probable would a value of our test statistic as extreme as ours have been?

• The answer is given by a probability known as the p-value.

• SPSS calls the p-value the ‘Sig.’, i.e., the SIGNIFICANCE PROBABILITY.

Page 35: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

35

A “significant” result

• A SIGNIFICANCE LEVEL is a small probability accepted by convention as a criterion for a decision about a statistical test.

• Most commonly, the 0.05 significance level is accepted by psychologists.

• If the p-value of your test statistic is LESS than the 0.05 significance level, your result is said to be ‘significant beyond the 0.05 level’.

Page 36: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

36

The result

• Report this result as follows: • r(27) = 0.89; p < .01

Number of pairs value of r p-value

Never report a p-value like this!

Report the p-value to 2 places of decimals: if it’s less than .01, use the inequality sign <.

The p-value

Page 37: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

37

Lecture 9

MORE ON ASSOCIATION

Page 38: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

38

We have shown that there is a strong association between a

child’s violence and the amount of violent screen material watched …

Page 39: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

39

but have we really gathered evidence for the hypothesis that

exposure to screen violence promotes actual violence?

Page 40: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

40

Remember:

CORRELATION

does not necessarily mean

CAUSATION

Page 41: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

41

One causal model

• The hypothesis implies this CAUSAL MODEL.• The results are CONSISTENT with the

hypothesis.• The correlation may indeed arise because

exposure to violence causes actual violence.

Page 42: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

42

Another causal model

• The child’s violent tendencies towards and appetite for violence lead to his (or her) watching violent programmes as often as possible.

• This model is also consistent with the data.

Page 43: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

43

A third causal model

• NEITHER variable causes the other. • Both are determined by the behaviour of the

child’s parents.

Page 44: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

44

The choice

• Does exposure cause violence (top model)?

• Does Violence lead to more exposure (middle model)?

• Are both exposure and violence caused by a third, background, variable (bottom model)?

Page 45: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

45

A background variable

• Perhaps neither Exposure nor Actual violence cause one another.

• Perhaps they are caused by a background parental behaviour variable.

• We have data on such a variable.

• The background variable correlates highly with both Exposure and Actual violence.

Page 46: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

46

Partial correlation

A PARTIAL CORRELATION is what remains of a Pearson correlation between two variables when the influence of a third variable has been removed, or PARTIALLED OUT.

Page 47: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

47

Three variables

• Let X1, X2 and X3 be three variables.

• Let r12 be the Pearson correlation between X1 and X2.

• Let r(12.3) be the partial correlation between X1 and X2 when the covariation of each with X3 has been removed.

Page 48: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

48

Partial correlation

Page 49: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

49

Explanation

Removes the influence of the third variable.

Rescales with new variances, so that the range is as below.

Page 50: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

50

Obtaining a partial correlation

Page 51: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

51

The partial correlation

• The partial correlation fails to reach significance.• Now that we have taken the background variable into

consideration, we see that there is no significant correlation between Exposure and Actual violence.

• It appears that, of the three possible causal models, the ‘third party’ model gives the most convincing account of the data.

Page 52: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

52

Levels of measurement • There are three levels: • 1. The SCALE level. The data are measures on

an independent scale with units. Heights, weights, performance scores and IQs are scale data. Each score has ‘stand-alone’ meaning.

• 2. The ORDINAL level. Data in the form of RANKS (1st, 3rd, 53rd). A rank has meaning only in relation to the other individuals in the sample. A rank does not express, in units, the extent to which a property is possessed.

• 3. The NOMINAL level. Assignments to categories (so-many males, so-many females.)

Page 53: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

53

3. Nominal data

• NOMINAL data relate to qualitative variables or attributes, such as gender or blood group, and are merely records of CATEGORY MEMBERSHIP.

• Nominal data are merely LABELS: they may take the form of numbers, but such numbers are arbitrary code numbers representing, say, the different blood groups or different nationalities. ANY numbers will do, as long as they are all different.

Page 54: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

54

A set of nominal data

• A medical researcher wishes to test the hypothesis that people with a certain type of body tissue (Critical) are more likely to show the presence of a potentially harmful antibody.

• Data are obtained on 79 people, who are classified with respect to 2 attributes:– 1. Tissue Type;– 2. Whether the antibody is present or absent.

Page 55: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

55

The research question

• Do more of the people in the critical group have the antibody?

• We are asking whether there is an ASSOCIATION between the variables of category membership (tissue type) and presence/absence of the antibody.

• This is the SCIENTIFIC hypothesis.

Page 56: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

56

The null hypothesis

• The NULL HYPOTHESIS is the negation of the scientific hypothesis.

• The null hypothesis states that there is NO association between tissue type and presence of the antibody.

Page 57: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

57

Contingency tables (cross-tabulations)

• When we wish to investigate whether an association exists between qualitative or categorical variables, the starting point is usually a display known as a CONTINGENCY TABLE, whose rows and columns represent the categories of the qualitative variables we are studying.

• Contingency tables are also known as CROSS-TABULATIONS, or CROSSTABS.

Page 58: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

58

The contingency table

• Is there an association between Tissue Type and Presence of the antibody?

• It looks as if the antibody is indeed more in evidence in the ‘Critical’ tissue group.

Page 59: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

59

The null hypothesis

• The null hypothesis is the negation of our scientific hypothesis, namely, the statement that the two variables are INDEPENDENT.

• In other words, any differences in the relative incidence of the antibody in the different tissue groups have resulted from SAMPLING ERROR.

Page 60: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

60

Expected cell frequencies

• The pattern of the OBSERVED FREQUENCIES (O) would suggest that there is a greater incidence of the antibody in the Critical tissue group.

• But the marginal totals showing the frequencies of the various groups in the sample also vary.

• What cell frequencies would we expect under the independence hypothesis?

Page 61: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

61

Expected cell frequencies (E)

• According to the null hypothesis, the joint occurrence of the antibody and a particular tissue type are independent events.

• The probability of the joint occurrence of independent events is the product of their separate probabilities.

• We find the expected frequencies (E) by multiplying together the marginal totals that intersect at the cells concerned and dividing by the total number of observations.

Page 62: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

62

The expected frequencies

• To obtain, say, the value of E for the top left cell, multiply the intersecting marginal totals (36 and 22) and divide by 79 (the total frequency), obtaining (36×22)/79 = 10.03 .

• In the Critical group, there seem to be large differences between O and E: fewer No’s than expected and more Yes’s.

Page 63: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

63

The chi-square (χ2) statistic

• We need a statistic which compares the differences between the O and E, so that a large value will cast doubt upon the null hypothesis of independence.

• Such a statistic is CHI-SQUARE (χ2).

Page 64: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

64

Formula for chi-square

• The element of chi-square expresses the square of the difference between O and E as a proportion of E.

• Add up these squared differences for all the cells in the contingency table.

Page 65: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

65

The value of chi-square

There are 8 terms in the summation, but only the first two and the last are shown in the calculation below.

Page 66: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

66

Degrees of freedom

• To decide whether a given value of chi-square is significant, we must specify the DEGREES OF FREEDOM df.

• If a contingency table has R rows and C columns, the degrees of freedom is given by

• df = (R – 1)(C – 1)• In our example, R = 4, C = 2 and so• df = (4 – 1)(2 – 1) = 3.

Page 67: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

67

Significance

• SPSS will tell us that the p-value of a chi-square with a value of 10.655 in the chi-square distribution with three degrees of freedom is .014.

• We should write this result as: χ2(3) = 10.66; p = .01 .

• Since the result is significant beyond the .05 level, we have evidence against the null hypothesis of independence and evidence for the scientific hypothesis.

Page 68: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

68

Summary

• This week I extended my discussion of statistical association to the topic of partial correlation.

• A partial correlation can help the researcher to choose from different causal models.

• I also considered the analysis of nominal data in the form of contingency tables.

• The chi-square statistic can be used to test for the presence of an association between qualitative or categorical variables.

Page 69: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

69

Multiple-choice example

Page 70: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

70

Multiple-choice example

Page 71: 1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries

71

Another example