tute11_4x1

15
1 Tutorial 11 Scope of this tutorial: Discussion - scatter plots • Re gr es si on Exercise • Revis ion of ear lier hypothesis test s 2 X (determinant) Y (outcome) Relation (a) Max. daily temperature and soft drink sales Max. daily temperature soft drink sales Positive (b) Odometer reading and sale price of used cars Odometer reading sale price of used cars Negative (c) Annual income and credit card balance of bank clients Annual income credit card balance Positive Worksheet 1: Q. 1 Wha t sor t of a rel ati on is tha t? 3 Worksheet 1: Q.2: Looking for relationship Rising trend, plus periodic rise and fall. What relation can you see in the following scatter diagrams? Negative linear relation between temperatur e and latitude: Higher latitude => lower temp. 4 Positive linear relation between chest girth and weight for males. Probably no relation between attendance of crowd at MCG and temperature.

Upload: cecilia-veronica-rana

Post on 06-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tute11_4x1

8/3/2019 Tute11_4x1

http://slidepdf.com/reader/full/tute114x1 1/15

111

Tutorial 11

Scope of this tutorial:

• Discussion - scatter plots

• Regression Exercise

• Revision of earlier hypothesis tests

222

X(determinant)

Y(outcome)

Relation

(a) Max. daily temperature

and soft drink sales

Max. daily

temperature

soft drink

sales

Positive

(b) Odometer reading and

sale price of used cars

Odometerreading

sale price ofused cars

Negative

(c) Annual income and

credit card balance of bank

clients

Annualincome

credit cardbalance

Positive

Worksheet 1: Q. 1 What sort of a relation is that?

333

Worksheet 1: Q.2: Lookingfor relationship

Rising trend, plusperiodic rise and fall.

What relation can you see in the

following scatter diagrams?

Negative linear relation betweentemperature and latitude:Higher latitude => lower temp. 444

Positive linear relation

between chest girth andweight for males.

Probably no relation betweenattendance of crowd at MCGand temperature.

Page 2: Tute11_4x1

8/3/2019 Tute11_4x1

http://slidepdf.com/reader/full/tute114x1 2/15

55

Astronomy: galaxyWhat is a galaxy? A galaxy is a collection of stars, ranging

from ten million (107) up to a hundred trillion (1014) stars.

77

Group of galaxies

888

Expansion of the universe –

after the Big Bang creation

999

Worksheet 2: Relation between distance fromEarth and radial velocity of galaxies in the universe– Hubble’s law

There appears to be apositive relation betweenvelocity and distance.

V = -40.784+454.158*distance

Page 3: Tute11_4x1

8/3/2019 Tute11_4x1

http://slidepdf.com/reader/full/tute114x1 3/15

101010

v = -40.784+454.158*distance

Meaning of the regression

equation (or meaning of the

slope):

For each increase of 1

megaparsec (Mpc) from Earth,

velocity increases by 454

km/sec, on average.

(1 Mpc=3.26 million light

years)

1 Mpc

454 km/sec

111111

Research Question: Is the distance from earth a

useful predictor of the radial velocity of galaxies? 

H Ho: β = 0

A The relation appears reasonably linear.

The points seem to be fairly evenly spread roundthe line with no obvious outliers, indicating that

the residuals have constant standard deviation andresiduals may be normally distributed.

T t = 6.036, df=22

P p-value ≈ 0. Since p<0.05, reject Ho

121212

C:

There is a significant positive linear relationbetween distance and radial velocity (Hubble’slaw).

For each increase of a distance of 1 megaparsec(Mpc) from earth, a galaxy’s velocity increasesby 454 km/sec, on average.

We are 95% confident that the true increase is

between 298 and 610 km/sec.

131313

Predictions:Predict the radial velocity of a galaxy which is 1.25

Mpc from Earth.v=-40.784+454.158*distance=-40.784+454.158*1.25= 526.9 km/sec

Predict the radial velocity of a galaxy which is 2.25Mpc from Earth.

2.25 megaparsecs is out of range of data, hencenot valid to predict.

Page 4: Tute11_4x1

8/3/2019 Tute11_4x1

http://slidepdf.com/reader/full/tute114x1 4/15

141414

Predictions:

Predict the distance from earth for a celestialobject which has a radial velocity of 400

km/sec.Not valid to predict independent variable (X)

from outcome (Y)

(For those curious:

If we really want to predict distance from velocity,we need to re-do the regression using velocity asx (independent variable) and distance as y(dependent variable). Then the new regression willbe Distance = a + b*velocity)

151515

Goodness-of-fit statistic r2

Interpret the goodness of fit statistic: r2 = 0.624.62.4% of the variation in radial velocity of galaxies

can be explained by the variation in distance fromEarth.

Calculate and interpret the correlation coefficient:

r=+√0.624 = 0.79, indicating there is a fairly strong positive linear relation between the two variables.

1616

Revision Questions

1717

Variable DescriptionID ID of male (1 – 252)

D Density determined from underwaterweighing

BF% Percent body fat from Siri's (1956) equation

Age Age (years)

W Weight (kg)

H Height (m)BMI Body Mass Index (kg/m2)

Nec Neck circumference (cm)

Che Chest circumference (cm)

Abd Abdomen circumference (cm)

Hip Hip circumference (cm)

Thi Thigh circumference (cm)

Kne Knee circumference (cm)

Ank Ankle circumference (cm)Bic Biceps (extended) circumference (cm)

Arm Forearm circumference (cm)

Wri Wrist circumference (cm)

Page 5: Tute11_4x1

8/3/2019 Tute11_4x1

http://slidepdf.com/reader/full/tute114x1 5/15

1818

Question 1: Display

1. a) What type of graphical display should you provide tocompare the percentage body fat (BF%) of males aged lessthan 39 years and males aged 39 years or more?

BF%: numeric (continuous) variable

Less than or more than 39 years old: binary variable  New variable

Hence comparative box plots

b) An obese person is said to have a body mass index (BMI)of more than 30. What type of graphical display should youprovide to compare the proportions of obese males aged

less than 39 years with those aged 39 or more years?

BMI above or below 30 (obese or not obese): binary variable  New variable

Less than or more than 39 years old: binary variable  New variable

Hence clustered bar chart.1919

Question 2: One-sample z-test

2. Research Question: Was the mean BMI of Australian males in 2008 the same as it was in the 1980s? 

Assume the mean BMI of Australian men inthe 1980s was equal to 25 with a SD of 3.5.In 2008, a random sample of 20 Australianmales was selected and the BMI of eachmale was recorded.

27.84 30.44 29.86 31.04 30.81 24.93 23.57 21.23 30.98 26.25

24.84 27.03 31.02 23.54 25.49 29.38 24.52 27.62 32.94 22.46

Carry out a suitable hypothesis test to answerthe research question. Assume that thevariation in BMI has not changed.

2020

One-sample z-test, NOT t-test

20 22 24 26 28 30 32

0

1

2

3

4

5

6

BMIFreq.

s, NOT used

)82.28,76.25(20

5.396.12895.2796.1CI95% =×±=×±=

n y

σ  2121

Was there a difference between the average percentage bodyfat (BF%) of American males in 1985 aged less than 39 yearsand the average BF% of American males aged 39 years ormore? => 2-sample t-test

0 5 10 15 20 25 300

5

10

15

20

<39yrsFreq.

0 10 20 30 400

5

10

15

20

>39yrsFreq.

Question 3(a)

0.0003

Page 6: Tute11_4x1

8/3/2019 Tute11_4x1

http://slidepdf.com/reader/full/tute114x1 6/15

2222

We are 95% confident that the BF% of males agedover 39 years between 1.87% and 6.26% higherthan the younger males on average.

CI/2 CI/2(-------------+------------)

4.066)26.6,87.1(198.2066.4

)(

forCI95%

21

1121

21

=

±=

+×±−=

nn pst  y y ν 

 µ  µ 

2*2.198=4.396 is NOT CI.

It is the length of CI.

Double

this is

NOT the

CI.

2323

Question 3(b): Was the ankle circumference 5cmmore, on average, than the wrist circumference of American males in 1985? => paired t-test

2 3 4 5 6 7 8 9 1011 120

20

40

60

80

100

differenceFreq.

2424

Question 4: RegressionResearch Question: Was the BMI of American

males in 1985 a useful predictor of BF%? 

Use the output to complete this question.

1. Which is the dependent/response variable?

2. Which is the independent/predictor variable?

3. Comment on the scatterplot.

4. Write down the equation of the regression line.

5. Test the statistical significance of the relation.

6. Predict, if appropriate, the expected % Body Fat for:

(a) a male with a BMI of 20; (b) a male with a BMI of 15

7. Predict, if appropriate, the expected BMI for a male with 20%Body Fat.

8. (a) Calculate r and interpret. (b) Calculate r2 and interpret.

2525

1. Which is the dependent/response variable? BF%

2. What is the independent/predictor variable? BMI?

3. Comment on the scatter plot.

• Positive linear relation: higher BMI => higher BF%

• Residual constant SD

• No outliers; symmetric on both sides

4. Regression equation

 BMI F  B *8186.19872.26%ˆ +−=

BF% vs BMI

0

10

20

30

40

50

15 20 25 30 35 40BMI

BF%

?

Page 7: Tute11_4x1

8/3/2019 Tute11_4x1

http://slidepdf.com/reader/full/tute114x1 7/15

2626

5. Test the statistical significance of the relation.

2727

6. Predict, if appropriate, the expected % Body Fatfor: (a) a male with a BMI of 20; (b) a male with aBMI of 15

 A male with a BMI of 20:BF% = -26.9872 + 1.8186*20 = 9.384

A male with a BMI of 15:Not valid to predict, since 15 is out of the range of 

the data.

7. Predict, if appropriate, the expected BMI for a malewith 20% Body Fat.Not valid to predict the independent variable (predictor 

or x) from the dependent variable (outcome or y)

2828

8. (a) Calculate r and interpret.r = √ 0.535 =0.73

There is a fairly strong positive linear relation between BMI and BF%.

(b) Calculate r2 and interpret.r2 = 0.535This indicates that about 53% of the

variation in BF% can be explained by thevariation in BMI.

2929

Question 5: Best predictorResearch Question: Which of the BMI, Neck Circumference or

Abdomen circumference is the best predictor of BF%?

Page 8: Tute11_4x1

8/3/2019 Tute11_4x1

http://slidepdf.com/reader/full/tute114x1 8/15

3030

Best predictorFill in the table. Explain your answer.

Each of the predictors is a significant predictor of BF%;

the p-val for each of the predictors is 0.000.

Each regression equation satisfies the assumptions of linearity,constant spread and normality of the residuals.

However, the abdomen circumference (Abd) provides the best fitting as r 2 = 67% is much higher than the others.

Note: 1. NEVER compare values of b. 2. It is easier, and better, to compare r 2 instead of p-vals. 3. Discard (cross out)variables if they break assumptions or if p-val>0.05.

3131

Practice Exercises: Question 1

Consider the computer output which shows the relationbetween students’ ideal weights and their actualweights for females. Note the dotted line represents the

cases when the ideal weight is the same as the actualweight.

3232

Question 1

(a) By comparing the regression line (solid) with the

line y=x (ie ideal weight_f = weight_f) (dotted),

comment on the scatter plot.

3333

Question 1

(b) From the partial EcStat output above, perform an

appropriate hypothesis test to see if there is alinear relation between Ideal weight_f (Y) and

Weight_f (X).

Partial ans: t=21.29

Page 9: Tute11_4x1

8/3/2019 Tute11_4x1

http://slidepdf.com/reader/full/tute114x1 9/15

3434

Question 1 (answers)

3535

Question 1 (continued)

(c) What is the value of goodness-of-fit statistic?

Interpret its meaning.

Ans: 70.8% Meaning: ……………….

3636

Question 2

The table on the right shows Accounting and

Statistics marks for 12 students.

Research question: Can Accounting marks

(X) be used to predict Statistics marks (Y)?

Use the partial EcStat output below to answer

the research question.

Acc Sta t

74 81

93 86

55 67

41 35

23 30

92 100

64 55

40 52

71 76

33 24

30 48

71 87

df: 10

coeff SE t p-value  

7.0194 7.971 0.8806 0.399 -10.741 24.779

0.9560 0.129

r-sq: 0.845 Resid SS: 1046.876 s: 10.232

outcome: 

predictor 

constant

Acc

Stat

95% C.I.

20

30

40

50

60

70

80

90

100

110

20 30 40 50 60 70 80 90 100Acc

Stat

3737

Question 2 (answers)

(Partial Ans: t=7.411)

Page 10: Tute11_4x1

8/3/2019 Tute11_4x1

http://slidepdf.com/reader/full/tute114x1 10/15

3838

Question 2 (continued)

(b) What is the value of goodness-of-fit statistic?

Interpret its meaning.

3939

Question 3

Research question: Can Weight (X) be used to predictHeight (Y)?

Using the partial EcStat output below to answer theresearch question.

150

155

160

165

170

175

180

185

190

195

40 50 60 70 80 90 100Weight

Height

df: 82coeff SE t p-value  

130.1702 4.041 32.2109 0.000 122.131 138.209

0.6699 0.061

r-sq: 0.595 Resid SS: 2855.483 s: 5.901

outcome: predictor 

constant

Weight

Height95% C.I.

4040

Question 3 (answers)

(Partial Ans: t=10.98)

4141

Question 3 (continued)

(b) What is the value of goodness-of-fit statistic?

Interpret its meaning.

Page 11: Tute11_4x1

8/3/2019 Tute11_4x1

http://slidepdf.com/reader/full/tute114x1 11/15

4242

Question 4

(Swap X and Y in Question 3.)

Research question: Can Height (X) be used to predictWight (Y)?

Using the partial EcStat output below to answer theresearch question.

40

50

60

70

80

90

100

150 160 170 180 190Height

Weight

df: 82

coeff SE t p-value  

-89.1380 14.096 -6.3238 0.000 -117.179 -61.097

0.8881 0.081 10.9760 0.000 0.727 1.049

r-sq: 0.595 Resid SS: 3785.467 s: 6.794

outcome: 

predictor 

constant

Height

Weight

95% C.I.

4343

Question 4 (answers)

(Partial Ans: t=10.976)

4444

Question 4 (continued)

(b) What is the value of goodness-of-fit statistic?

Interpret its meaning.

(c) Explain why the value of r2 is the same as that in

Question 3.

4545

Question 5

For each of the following given regression equations,

interpret (i) the equation and (ii) r2.

(a) X=time a bee spends on a flow

Y = % pollen removed,

r2 = 0.384

Interpretation of equation (slope):

Interpretation of r2:

x05.213y+=

Page 12: Tute11_4x1

8/3/2019 Tute11_4x1

http://slidepdf.com/reader/full/tute114x1 12/15

4646

Question 5 (continued)

(b) X = students’ high school results

Y = STAT170 exam results

r2 = 6.2%

x54.023.29y +=

4747

Question 5 (continued)

(c) X = number of cans of beer drank 

Y = blood alcohol content

r2 = 82.1% x y 0203.00217.0ˆ +−=

48

Computer (EcStat) Exercises

49

Question 1(Q.2 of previous exercise)

Research question: Can Accountingmarks (X) be used to predictStatistics marks (Y)?

1. Enter the 2 columns as shown.

2. Optional but recommended:

Pre-highlight Y (Account), thenpress Ctrl key and highlight X(Stat).

3. Click “Relationship” (4th icon).

Page 13: Tute11_4x1

8/3/2019 Tute11_4x1

http://slidepdf.com/reader/full/tute114x1 13/15

50

Make sure the X(Account) and Y

(Stat) are chosencorrectly,otherwise you willhave the wronggraph, and wrongregression results.

51

df: 10

coeff SE t p-value  

7.0194 7.971 0.8806 0.399 -10.741 24.779

0.9560 0.129 7.3927 0.000 0.668 1.244

r-sq: 0.845 Resid SS: 1046.876 s: 10.232

Fitted line:  Stat (Y) = 7.0194 + 0.956 Account (X)

outcome: 

predictor 

constant

Account (X)

Stat (Y)

95% C.I.

20

30

40

50

60

70

80

90

100

110

20 40 60 80 100Account (X)

Stat (Y)

52

Question 1(continued)

Fill in the following answers:

(a) Ho: ___________________

(b) Write down the regression equation:

______________________________

(c) What is the value of test statistic? (Include symbol z/t)___________________

(d) What is the value of p-val? ________

(e) Do you reject or not reject Ho? _________

(f) What is a 95% CI for β ? ____________________(g) Does the 95% CI for β include the null value? ______

(h) What is the value of goodness-of-fit statistic? _______53

Question 2 (Pract 8 Exercises)

Load the file “pulse.xls” (used in Pract/WASP 8)

Research question: Can Height (X) be used to predictWeight (Y) ?

Perform the hypothesis test using EcStat. Then answerthe questions on the next slide.

Page 14: Tute11_4x1

8/3/2019 Tute11_4x1

http://slidepdf.com/reader/full/tute114x1 14/15

54

Question 2(continued)

Fill in the following answers:

(a) Ho: ___________________

(b) Write down the regression equation:______________________________

(c) What is the value of test statistic? (Include symbol z/t)___________________

(d) What is the value of p-val? ________

(e) Do you reject or not reject Ho? _________

(f) What is a 95% CI for β ? ____________________

(g) Does the 95% CI for β include the null value? ______

(h) What is the value of goodness-of-fit statistic? _______55

Question 3 (Pract 8 Exercises)

Load the file “Storks.xls” (used in Pract/WASP 8)

Research question: Can the number of storks (Stork)be used to predict the number of babies born (Birth)?

Perform the hypothesis test using EcStat. Then answerthe questions on the next slide.

56

Question 3(continued)

Fill in the following answers:

(a) Ho: ___________________

(b) Write down the regression equation:

______________________________

(c) What is the value of test statistic? (Include symbol z/t)___________________

(d) What is the value of p-val? ________

(e) Do you reject or not reject Ho? _________

(f) What is a 95% CI for β ? ____________________(g) Does the 95% CI for β include the null value? ______

(h) What is the value of goodness-of-fit statistic? _______57

Question 4 (Pract 8 Exercises)

Load the file “Peru.xls” (used in Pract/WASP 8)

Research question: Can the number of years (Years)since migration be used to predict the systolic bloodpressure (Systol)?

Perform the hypothesis test using EcStat. Then answerthe questions on the next slide.

Page 15: Tute11_4x1

8/3/2019 Tute11_4x1

http://slidepdf.com/reader/full/tute114x1 15/15

58

Question 4 (continued)

Fill in the following answers:

(a) Ho: ___________________

(b) Write down the regression equation:______________________________

(c) What is the value of test statistic? (Include symbol z/t)___________________

(d) What is the value of p-val? ________

(e) Do you reject or not reject Ho? _________

(f) What is a 95% CI for β ? ____________________

(g) Does the 95% CI for β include the null value? ______

(h) What is the value of goodness-of-fit statistic? _______59

Question 5 (Pract 8 Exercises)

Continue with the file “Peru.xls”.

Research question: Can Forearm (X) be used topredict Weight (Y)?

Perform the hypothesis test using EcStat. Then answerthe questions on the next slide.

60

Question 5 (continued)

Fill in the following answers:

(a) Ho: ___________________

(b) Write down the regression equation:

______________________________

(c) What is the value of test statistic? (Include symbol z/t)___________________

(d) What is the value of p-val? ________

(e) Do you reject or not reject Ho? _________

(f) What is a 95% CI forβ

? ____________________(g) Does the 95% CI for β include the null value? ______

(h) What is the value of goodness-of-fit statistic? _______