more on two-variable data. chapter objectives identify settings in which a transformation might be...

53
More on Two-Variable Data

Post on 20-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

More on Two-Variable Data

Page 2: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

Chapter Objectives

• Identify settings in which a transformation might be necessary in order to achieve linearity.

• Use transformations involving powers and logarithms to linearize curved relationships.

• Explain what is meant by a two-way table, and describe its parts.

• Give an example of Simpson’s Paradox.• Explain what gives the best evidence for

causation.• Explain the criteria for establishing causation

when experimentation is not feasible.

Page 3: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

The Goal

• Our goal is to fit a model to curved data so that we can make predictions as we did in chapter 3.

• HOWEVER, the only statistical tool we have to fit a model is the least-squares regression model.

• THEREFORE, in order to find a model for curved data, we must first “straighten it out”….

Page 4: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

Transforming Relationships

• Data that displays a curved pattern can be modeled by a number of different functions.

• Two most common:– Exponential (y=ABx)– Power (y=AxB)

• Chapter 4 focuses on these two models

Page 5: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

pp. 195 – 6

• Example 4.1

• Brain weight v. body weight

• Note about variables:– Sometimes we wish to transform x, or y, or

both x and y.– Therefore we refer to variables generically

as t.

Page 6: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

Why

• Linear transformations cannot straighten a curved relationship between two variables.

• Because of this, we must resort to functions that are not linear.

Page 7: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

A Note about Monotonic Functions

Page 8: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

4.1

• A. y = 2.54 xmonotonic increasing

• B. y = 60/xmonotonic decreasing

• C. circumference = π(diameter)monotonic increasing

• D. SquaredError = (time – 5)2

Not monotonic

Page 9: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

Figure 4.5

• What can we learn?– The graph of a linear function (power p = 1) is a straight line.– Powers greater than 1 (like p = 2 and p = 4) give graphs that

bend upward. The sharpness of the bend increases as p increases.

– Powers less than 1 but greater than 0 (like p = 0.5) give graphs that bend downward.

– Powers less than 0 (like p = -0.5 and p = -1) give graphs that decrease as x increases. Greater negative values of p result in graphs that decrease more quickly.

– Look at the p = 0 graph. You may be surprised that this is not the graph of y = x0. Why not? The 0th power x0 is just the constant 1, which is not very useful. The p = 0 entry in the figure is not constant; it is the logarithm, log x. That is, the logarithm fits into the hierarchy of power transformations at p = 0.

Page 10: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

pp. 201 - 202

• Example 4.2 runs through several steps from the ladder of power transformations.

• This emphasizes that the process can be one of – (a) making a good guess, based on observations of a

graph of the data, about the type of transformation needed and

– (b) trying several types of the transformation chosen.• This can get tedious, so the next section

introduces a more analytic approach.• The first approach is to look for an exponential

growth pattern, which has the advantage that it can be linearized by taking logarithms (of the response variable) to transform the data.

Page 11: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

4.3

• Weight = c1 (height)3 and

strength = c2 (height)2;

therefore, strength = c (weight)2/3, where

c is a constant.

Page 12: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

4.4

• A graph of the power law y =x2/3 shows that strength does not increase linearly with body weight, as would be the case if a person 1 million times as heavy as an ant could lift 1 million times more than the ant. Rather, strength increases more slowly. For example, if weight is multiplied by 1000, strength will increase by a factor of (1000)2/3 = 100.

Page 13: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

4.5

• Let y = average heart rate and x = body weight.• Keibler’s law says that total energy consumed is

proportional to the three-fourths power of body weight, that is, Energy = c1x3/4.

• But total energy consumed is also proportional to the product of the volume of blood pumped by the heart and the heart rate, that is, Energy = c2(volume)y.

• The volume of blood pumped by the heart is proportional to body weight, that is, Volume = c3x.

• Putting these three equations together yields

c1x3/4 = c2(volume)y = c2(c3x)y.• Solving for y, we obtain 4/1

32

4/31 cxxcc

xcy

Page 14: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

Exponential Growth

• Linear growth: adding a fixed increment in each equal time period.

• Exponential growth: multiplying by a fixed number in each equal time period.– Can also be looked at as growing by a fixed

percentage.

Page 15: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

p. 205

• Example 4.4• Is this exponential growth?• What is the projected amount for 2005?• Actual was 203,000,000 (2005)• Other interesting statistics:

– 2,000,000,000 cell phones world wide• 4.5% world without

– Average American spends 13 talking hours per month– Average American in 18 – 24 age group spends 22

talking hours per month

Page 16: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

Texting in the United States

Page 17: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

Logarithm

logbx=y if and only if by=x

The rules for logarithms are

XpX

BAB

A

BAAB

p loglog

logloglog

logloglog

Page 18: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

p. 209

• Example 4.6

Page 19: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

4.6

• A.

0

500,000

1,000,000

1,500,000

2,000,000

2,500,000

3,000,000

1977 1978 1979 1980 1981 1982

Year

Acr

es

Page 20: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

4.6

• B. 226260/63024 = 3.59

907075/226260 = 4.01

2826095/907075 = 3.12

• C. log y yields 4.7996, 5.3546, 5.9576, 6.4512

Page 21: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

4.6

• C.

4.5000

4.70004.9000

5.1000

5.30005.5000

5.7000

5.9000

6.10006.3000

6.5000

1977 1978 1979 1980 1981 1982

Year

log

(ac

res)

Page 22: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

4.6

• D. use calculator to confirm

• E. The residual plot of the transformed data shows no clear pattern, so the line is a reasonable model for these points.

Page 23: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

4.6

• F. xy 5558.051.1094ˆlog xy 5558.051.1094ˆlog 1010 xy 5558.051.109410ˆ

xy 5558.051.1094 1010ˆ

Page 24: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

4.6

• G. The predicted number of acres defoliated in 1982 is the exponential function evaluated at 1982, which gives 10,719,964.92 acres.

Page 25: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

4.9

162 41 x

048576,12 45 x

Page 26: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

4.10

• A. Year # children killed

1951 2

1952 4

1953 8

1954 16

1955 32

1956 64

1957 128

1958 256

1959 512

1960 1024

Page 27: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

4.10

• B.

0

200

400

600

800

1000

1200

1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961

Year

# C

hil

dre

n K

ille

d

Page 28: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

4.10

• C. If x = number of years after 1950, then y = the number of children killed x years after 1950 = 2x.

At x = 45, y = 245 = 3.52 x 1013, or

35,200,000,000,000.

Page 29: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

4.10

• D.

0

0.5

1

1.5

2

2.5

3

3.5

1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961

Year

log

(#

chil

dre

n k

ille

d)

Page 30: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

4.10

• E. b = 0.3010

a = -587.008

xy 3010.0008.587ˆlog

Page 31: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

p. 215

• Exponential growth models become linear when we apply the logarithm transformation to the response variable y.

• Power law models become linear when we apply the logarithm transformation to both variables.

Page 32: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

4.17

• A. Year Value

1 537.50

2 577.81

3 621.15

4 667.73

5 717.81

6 771.65

7 829.52

8 891.74

9 958.62

10 1030.52

Page 33: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

4.17

• B.

500.00

600.00

700.00

800.00

900.00

1000.00

1100.00

0 1 2 3 4 5 6 7 8 9 10 11

Year

Val

ue

Page 34: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

4.17

• C. 2.73, 2.76, 2.79, 2.82, 2.86, 2.89, 2.92, 2.95, 2.98, 3.01

2.70

2.75

2.80

2.85

2.90

2.95

3.00

3.05

0 2 4 6 8 10 12

Year

log

(Val

ue)

Page 35: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

4.18

• Alice has

• Fred has

17.3049075.1500 25

00.300025100500

Page 36: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

Cautions About Correlation and Regression

Page 37: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

Our Tools for Describing Data Sets

• Correlation– r: Strength, form, direction

• Regression– Generalized pattern– Useful for predictions

• Limitations of our tools– Correlation and regression describe only

linear relationships– The correlation “r” and the “LSRL” are NOT

RESISTANT

Page 38: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

Other Cautions

• Extrapolation– The use of a regression line for prediction far

outside the domain used.– Examples:

• Age v. Height• Time v. Death Rate ( Swine Flu)• Time v. Water Level of a Lake• Time v. Children gunned down

Page 39: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

Other Cautions

• Lurking Variables– A variable that is not among the explanatory

or response variables in a study and yet may influence the interpretation of relationships among these variables.

– Can falsely suggest relationship between x and y

– Can hide actual relationship between x and y

Page 40: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

Other Cautions

• Lurking Variables– An example….

Page 41: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

• There's this guy who's going to clean the windows of a mental asylum. A patient follows him shouts to him "I gotta secret, I gotta secret...", he ignores the patient. Again the patient follows him, but he ignores his cries. By the time he's nearly finished the building, he's really curious about what the patients secret is, so he decides to ask the patient. The patient pulls a matchbox out of his pocket, opens it and puts it on a table. Out crawls this little spider. The patient says "spider go left", and the spider walks to it's left a bit. Then he says "spider go right", the spider walks to its right a little bit. He says "spider turn around, walk forward then go right", and sure enough the spider turns around, walks forward, and then goes right a bit. The window cleaner is amazed "Wow! He says, that's amazing!", "No, that's not my secret says the patient, watch". He picks up the spider in his hand and pulls all its legs off then puts it back on the table. "Spider go right", the spider doesn't move, "spider go Left", the spider doesn't move, "Spider turn around" again the spider doesn't move. "There!" he says, "that's my secret, if you pull all a spiders legs off they go deaf....................

Page 42: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

• The answer is not available in the original data, but was discovered through some additional research on the Buick Estate Wagon. These data were collected by Consumer's Union on a test track (rather than using the EPA test values for fuel efficiency) following the manufacturer's recommendations for each car's maintenance. Additional research revealed that starting with this model year, Buick recommended a higher tire inflation pressure for the Buick Estate Wagon. The recommended inflation pressure level was higher than the level for other cars in the survey. Harder tires present less rolling resistance and improve gas mileage; therefore, the Buick Estate Wagon outperformed our expectations based on our regression model, which did not account for tire inflation pressure. In our model Tire Pressure is a lurking variable, variable that seems to help in predicting gas mileage but is not included in the model.

Page 43: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

Other Cautions

• Using averaged data– Pay particular attention to data that has been

averaged– The correlation and LSRL of these data sets

should not be applied to the individuals that the averages came from

• Example– Examining monthly data and attempting to apply it to a

day of that month.

Page 44: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

Beware the post-hoc fallacyBeware the post-hoc fallacy

“Post hoc, ergo propter hoc.”

To avoid falling for the post-hoc fallacy, assuming that an observed correlation is due to causation, you must put any statement of relationship through sharp inspection.

Causation can not be established “after the fact.” It can only be established through well-designed experiments. {see Ch 5}

Page 45: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

Explaining AssociationExplaining Association

Strong Associations can generally be explained by one of three relationships.

ConfoundingConfounding: x may cause y, but y may instead be caused by a confounding variable z

CommonCommon ResponseResponse: x and y are reacting to a lurking variable z

CausationCausation:x causes y

Page 46: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

CausationCausation

Causation is not easily established.

The best evidence for causation comes from experiements that change x while holding all other factors fixed.

Even when direct causation is present, it is rarely a complete explanation of an association between two variables.

Even well established causal relations may not generalize to other settings.

Page 47: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

Common ResponseCommon Response

“Beware the Lurking Variable”

The observed association between two variables may be due to a third variable.

Both x and y may be changing in response to changes in z.

Page 48: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

ConfoundingConfounding

Two variables are confounded when their effects on a response variable cannot be distinguished from each other.Confounding prevents us from drawing conclusions about causation.

We can help reduce the chances of confounding by designing a well-controlled experiment.

Page 49: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

ExampleExample

People with two cars tend to live longer than people who own only one car. Owning three cars is even better, and so on. What might explain the association?

Page 50: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

p. 238

• 4.38: People who use artificial sweeteners in place of sugar tend to be heavier than people who use sugar. Does artificial sweetener use cause weight gain?– There may be a causative effect, but in the

direction opposite to the one suggested: People who are overweight are more likely to be on diets, and so choose artificial sweeteners over sugar. Also, heavier people are at a higher risk to develop diabetes; if they do, they are likely to switch to artificial sweeteners.

Page 51: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

p. 238

4.39: Women who work in the production of computer chips have abnormally high numbers of miscarriages. The union claimed chemicals cause the miscarriages. Another explanation may be the fact these workers spend a lot of time on their feet.– Time standing up is a confounding variable in

this case.

Page 52: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

p. 239p. 239

4.41: Children who watch many hours of TV get lower grades on average than those who watch less TV. Why does this fact not show that watching TV causes low grades?

Page 53: More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations

p. 239

4.43: High school students who take the SAT, enroll in an SAT coaching course, and take the SAT again raise their mathematics score from an average of 521 to 561. Can this increase be attributed entirely to taking the course?

The effect of coaching and confounded with those of experience. A student who has taken the SAT once may improve his ro her score on the second attempt because of increased familiarity with the test.