tute11_4x1
TRANSCRIPT
8/3/2019 Tute11_4x1
http://slidepdf.com/reader/full/tute114x1 1/15
111
Tutorial 11
Scope of this tutorial:
• Discussion - scatter plots
• Regression Exercise
• Revision of earlier hypothesis tests
222
X(determinant)
Y(outcome)
Relation
(a) Max. daily temperature
and soft drink sales
Max. daily
temperature
soft drink
sales
Positive
(b) Odometer reading and
sale price of used cars
Odometerreading
sale price ofused cars
Negative
(c) Annual income and
credit card balance of bank
clients
Annualincome
credit cardbalance
Positive
Worksheet 1: Q. 1 What sort of a relation is that?
333
Worksheet 1: Q.2: Lookingfor relationship
Rising trend, plusperiodic rise and fall.
What relation can you see in the
following scatter diagrams?
Negative linear relation betweentemperature and latitude:Higher latitude => lower temp. 444
Positive linear relation
between chest girth andweight for males.
Probably no relation betweenattendance of crowd at MCGand temperature.
8/3/2019 Tute11_4x1
http://slidepdf.com/reader/full/tute114x1 2/15
55
Astronomy: galaxyWhat is a galaxy? A galaxy is a collection of stars, ranging
from ten million (107) up to a hundred trillion (1014) stars.
77
Group of galaxies
888
Expansion of the universe –
after the Big Bang creation
999
Worksheet 2: Relation between distance fromEarth and radial velocity of galaxies in the universe– Hubble’s law
There appears to be apositive relation betweenvelocity and distance.
V = -40.784+454.158*distance
8/3/2019 Tute11_4x1
http://slidepdf.com/reader/full/tute114x1 3/15
101010
v = -40.784+454.158*distance
Meaning of the regression
equation (or meaning of the
slope):
For each increase of 1
megaparsec (Mpc) from Earth,
velocity increases by 454
km/sec, on average.
(1 Mpc=3.26 million light
years)
1 Mpc
454 km/sec
111111
Research Question: Is the distance from earth a
useful predictor of the radial velocity of galaxies?
H Ho: β = 0
A The relation appears reasonably linear.
The points seem to be fairly evenly spread roundthe line with no obvious outliers, indicating that
the residuals have constant standard deviation andresiduals may be normally distributed.
T t = 6.036, df=22
P p-value ≈ 0. Since p<0.05, reject Ho
121212
C:
There is a significant positive linear relationbetween distance and radial velocity (Hubble’slaw).
For each increase of a distance of 1 megaparsec(Mpc) from earth, a galaxy’s velocity increasesby 454 km/sec, on average.
We are 95% confident that the true increase is
between 298 and 610 km/sec.
131313
Predictions:Predict the radial velocity of a galaxy which is 1.25
Mpc from Earth.v=-40.784+454.158*distance=-40.784+454.158*1.25= 526.9 km/sec
Predict the radial velocity of a galaxy which is 2.25Mpc from Earth.
2.25 megaparsecs is out of range of data, hencenot valid to predict.
8/3/2019 Tute11_4x1
http://slidepdf.com/reader/full/tute114x1 4/15
141414
Predictions:
Predict the distance from earth for a celestialobject which has a radial velocity of 400
km/sec.Not valid to predict independent variable (X)
from outcome (Y)
(For those curious:
If we really want to predict distance from velocity,we need to re-do the regression using velocity asx (independent variable) and distance as y(dependent variable). Then the new regression willbe Distance = a + b*velocity)
151515
Goodness-of-fit statistic r2
Interpret the goodness of fit statistic: r2 = 0.624.62.4% of the variation in radial velocity of galaxies
can be explained by the variation in distance fromEarth.
Calculate and interpret the correlation coefficient:
r=+√0.624 = 0.79, indicating there is a fairly strong positive linear relation between the two variables.
1616
Revision Questions
1717
Variable DescriptionID ID of male (1 – 252)
D Density determined from underwaterweighing
BF% Percent body fat from Siri's (1956) equation
Age Age (years)
W Weight (kg)
H Height (m)BMI Body Mass Index (kg/m2)
Nec Neck circumference (cm)
Che Chest circumference (cm)
Abd Abdomen circumference (cm)
Hip Hip circumference (cm)
Thi Thigh circumference (cm)
Kne Knee circumference (cm)
Ank Ankle circumference (cm)Bic Biceps (extended) circumference (cm)
Arm Forearm circumference (cm)
Wri Wrist circumference (cm)
8/3/2019 Tute11_4x1
http://slidepdf.com/reader/full/tute114x1 5/15
1818
Question 1: Display
1. a) What type of graphical display should you provide tocompare the percentage body fat (BF%) of males aged lessthan 39 years and males aged 39 years or more?
BF%: numeric (continuous) variable
Less than or more than 39 years old: binary variable New variable
Hence comparative box plots
b) An obese person is said to have a body mass index (BMI)of more than 30. What type of graphical display should youprovide to compare the proportions of obese males aged
less than 39 years with those aged 39 or more years?
BMI above or below 30 (obese or not obese): binary variable New variable
Less than or more than 39 years old: binary variable New variable
Hence clustered bar chart.1919
Question 2: One-sample z-test
2. Research Question: Was the mean BMI of Australian males in 2008 the same as it was in the 1980s?
Assume the mean BMI of Australian men inthe 1980s was equal to 25 with a SD of 3.5.In 2008, a random sample of 20 Australianmales was selected and the BMI of eachmale was recorded.
27.84 30.44 29.86 31.04 30.81 24.93 23.57 21.23 30.98 26.25
24.84 27.03 31.02 23.54 25.49 29.38 24.52 27.62 32.94 22.46
Carry out a suitable hypothesis test to answerthe research question. Assume that thevariation in BMI has not changed.
2020
One-sample z-test, NOT t-test
20 22 24 26 28 30 32
0
1
2
3
4
5
6
BMIFreq.
s, NOT used
)82.28,76.25(20
5.396.12895.2796.1CI95% =×±=×±=
n y
σ 2121
Was there a difference between the average percentage bodyfat (BF%) of American males in 1985 aged less than 39 yearsand the average BF% of American males aged 39 years ormore? => 2-sample t-test
0 5 10 15 20 25 300
5
10
15
20
<39yrsFreq.
0 10 20 30 400
5
10
15
20
>39yrsFreq.
Question 3(a)
0.0003
8/3/2019 Tute11_4x1
http://slidepdf.com/reader/full/tute114x1 6/15
2222
We are 95% confident that the BF% of males agedover 39 years between 1.87% and 6.26% higherthan the younger males on average.
CI/2 CI/2(-------------+------------)
4.066)26.6,87.1(198.2066.4
)(
forCI95%
21
1121
21
=
±=
+×±−=
−
nn pst y y ν
µ µ
2*2.198=4.396 is NOT CI.
It is the length of CI.
Double
this is
NOT the
CI.
2323
Question 3(b): Was the ankle circumference 5cmmore, on average, than the wrist circumference of American males in 1985? => paired t-test
2 3 4 5 6 7 8 9 1011 120
20
40
60
80
100
differenceFreq.
2424
Question 4: RegressionResearch Question: Was the BMI of American
males in 1985 a useful predictor of BF%?
Use the output to complete this question.
1. Which is the dependent/response variable?
2. Which is the independent/predictor variable?
3. Comment on the scatterplot.
4. Write down the equation of the regression line.
5. Test the statistical significance of the relation.
6. Predict, if appropriate, the expected % Body Fat for:
(a) a male with a BMI of 20; (b) a male with a BMI of 15
7. Predict, if appropriate, the expected BMI for a male with 20%Body Fat.
8. (a) Calculate r and interpret. (b) Calculate r2 and interpret.
2525
1. Which is the dependent/response variable? BF%
2. What is the independent/predictor variable? BMI?
3. Comment on the scatter plot.
• Positive linear relation: higher BMI => higher BF%
• Residual constant SD
• No outliers; symmetric on both sides
4. Regression equation
BMI F B *8186.19872.26%ˆ +−=
BF% vs BMI
0
10
20
30
40
50
15 20 25 30 35 40BMI
BF%
?
8/3/2019 Tute11_4x1
http://slidepdf.com/reader/full/tute114x1 7/15
2626
5. Test the statistical significance of the relation.
2727
6. Predict, if appropriate, the expected % Body Fatfor: (a) a male with a BMI of 20; (b) a male with aBMI of 15
A male with a BMI of 20:BF% = -26.9872 + 1.8186*20 = 9.384
A male with a BMI of 15:Not valid to predict, since 15 is out of the range of
the data.
7. Predict, if appropriate, the expected BMI for a malewith 20% Body Fat.Not valid to predict the independent variable (predictor
or x) from the dependent variable (outcome or y)
2828
8. (a) Calculate r and interpret.r = √ 0.535 =0.73
There is a fairly strong positive linear relation between BMI and BF%.
(b) Calculate r2 and interpret.r2 = 0.535This indicates that about 53% of the
variation in BF% can be explained by thevariation in BMI.
2929
Question 5: Best predictorResearch Question: Which of the BMI, Neck Circumference or
Abdomen circumference is the best predictor of BF%?
8/3/2019 Tute11_4x1
http://slidepdf.com/reader/full/tute114x1 8/15
3030
Best predictorFill in the table. Explain your answer.
Each of the predictors is a significant predictor of BF%;
the p-val for each of the predictors is 0.000.
Each regression equation satisfies the assumptions of linearity,constant spread and normality of the residuals.
However, the abdomen circumference (Abd) provides the best fitting as r 2 = 67% is much higher than the others.
Note: 1. NEVER compare values of b. 2. It is easier, and better, to compare r 2 instead of p-vals. 3. Discard (cross out)variables if they break assumptions or if p-val>0.05.
3131
Practice Exercises: Question 1
Consider the computer output which shows the relationbetween students’ ideal weights and their actualweights for females. Note the dotted line represents the
cases when the ideal weight is the same as the actualweight.
3232
Question 1
(a) By comparing the regression line (solid) with the
line y=x (ie ideal weight_f = weight_f) (dotted),
comment on the scatter plot.
3333
Question 1
(b) From the partial EcStat output above, perform an
appropriate hypothesis test to see if there is alinear relation between Ideal weight_f (Y) and
Weight_f (X).
Partial ans: t=21.29
8/3/2019 Tute11_4x1
http://slidepdf.com/reader/full/tute114x1 9/15
3434
Question 1 (answers)
3535
Question 1 (continued)
(c) What is the value of goodness-of-fit statistic?
Interpret its meaning.
Ans: 70.8% Meaning: ……………….
3636
Question 2
The table on the right shows Accounting and
Statistics marks for 12 students.
Research question: Can Accounting marks
(X) be used to predict Statistics marks (Y)?
Use the partial EcStat output below to answer
the research question.
Acc Sta t
74 81
93 86
55 67
41 35
23 30
92 100
64 55
40 52
71 76
33 24
30 48
71 87
df: 10
coeff SE t p-value
7.0194 7.971 0.8806 0.399 -10.741 24.779
0.9560 0.129
r-sq: 0.845 Resid SS: 1046.876 s: 10.232
outcome:
predictor
constant
Acc
Stat
95% C.I.
20
30
40
50
60
70
80
90
100
110
20 30 40 50 60 70 80 90 100Acc
Stat
3737
Question 2 (answers)
(Partial Ans: t=7.411)
8/3/2019 Tute11_4x1
http://slidepdf.com/reader/full/tute114x1 10/15
3838
Question 2 (continued)
(b) What is the value of goodness-of-fit statistic?
Interpret its meaning.
3939
Question 3
Research question: Can Weight (X) be used to predictHeight (Y)?
Using the partial EcStat output below to answer theresearch question.
150
155
160
165
170
175
180
185
190
195
40 50 60 70 80 90 100Weight
Height
df: 82coeff SE t p-value
130.1702 4.041 32.2109 0.000 122.131 138.209
0.6699 0.061
r-sq: 0.595 Resid SS: 2855.483 s: 5.901
outcome: predictor
constant
Weight
Height95% C.I.
4040
Question 3 (answers)
(Partial Ans: t=10.98)
4141
Question 3 (continued)
(b) What is the value of goodness-of-fit statistic?
Interpret its meaning.
8/3/2019 Tute11_4x1
http://slidepdf.com/reader/full/tute114x1 11/15
4242
Question 4
(Swap X and Y in Question 3.)
Research question: Can Height (X) be used to predictWight (Y)?
Using the partial EcStat output below to answer theresearch question.
40
50
60
70
80
90
100
150 160 170 180 190Height
Weight
df: 82
coeff SE t p-value
-89.1380 14.096 -6.3238 0.000 -117.179 -61.097
0.8881 0.081 10.9760 0.000 0.727 1.049
r-sq: 0.595 Resid SS: 3785.467 s: 6.794
outcome:
predictor
constant
Height
Weight
95% C.I.
4343
Question 4 (answers)
(Partial Ans: t=10.976)
4444
Question 4 (continued)
(b) What is the value of goodness-of-fit statistic?
Interpret its meaning.
(c) Explain why the value of r2 is the same as that in
Question 3.
4545
Question 5
For each of the following given regression equations,
interpret (i) the equation and (ii) r2.
(a) X=time a bee spends on a flow
Y = % pollen removed,
r2 = 0.384
Interpretation of equation (slope):
Interpretation of r2:
x05.213y+=
8/3/2019 Tute11_4x1
http://slidepdf.com/reader/full/tute114x1 12/15
4646
Question 5 (continued)
(b) X = students’ high school results
Y = STAT170 exam results
r2 = 6.2%
x54.023.29y +=
4747
Question 5 (continued)
(c) X = number of cans of beer drank
Y = blood alcohol content
r2 = 82.1% x y 0203.00217.0ˆ +−=
48
Computer (EcStat) Exercises
49
Question 1(Q.2 of previous exercise)
Research question: Can Accountingmarks (X) be used to predictStatistics marks (Y)?
1. Enter the 2 columns as shown.
2. Optional but recommended:
Pre-highlight Y (Account), thenpress Ctrl key and highlight X(Stat).
3. Click “Relationship” (4th icon).
8/3/2019 Tute11_4x1
http://slidepdf.com/reader/full/tute114x1 13/15
50
Make sure the X(Account) and Y
(Stat) are chosencorrectly,otherwise you willhave the wronggraph, and wrongregression results.
51
df: 10
coeff SE t p-value
7.0194 7.971 0.8806 0.399 -10.741 24.779
0.9560 0.129 7.3927 0.000 0.668 1.244
r-sq: 0.845 Resid SS: 1046.876 s: 10.232
Fitted line: Stat (Y) = 7.0194 + 0.956 Account (X)
outcome:
predictor
constant
Account (X)
Stat (Y)
95% C.I.
20
30
40
50
60
70
80
90
100
110
20 40 60 80 100Account (X)
Stat (Y)
52
Question 1(continued)
Fill in the following answers:
(a) Ho: ___________________
(b) Write down the regression equation:
______________________________
(c) What is the value of test statistic? (Include symbol z/t)___________________
(d) What is the value of p-val? ________
(e) Do you reject or not reject Ho? _________
(f) What is a 95% CI for β ? ____________________(g) Does the 95% CI for β include the null value? ______
(h) What is the value of goodness-of-fit statistic? _______53
Question 2 (Pract 8 Exercises)
Load the file “pulse.xls” (used in Pract/WASP 8)
Research question: Can Height (X) be used to predictWeight (Y) ?
Perform the hypothesis test using EcStat. Then answerthe questions on the next slide.
8/3/2019 Tute11_4x1
http://slidepdf.com/reader/full/tute114x1 14/15
54
Question 2(continued)
Fill in the following answers:
(a) Ho: ___________________
(b) Write down the regression equation:______________________________
(c) What is the value of test statistic? (Include symbol z/t)___________________
(d) What is the value of p-val? ________
(e) Do you reject or not reject Ho? _________
(f) What is a 95% CI for β ? ____________________
(g) Does the 95% CI for β include the null value? ______
(h) What is the value of goodness-of-fit statistic? _______55
Question 3 (Pract 8 Exercises)
Load the file “Storks.xls” (used in Pract/WASP 8)
Research question: Can the number of storks (Stork)be used to predict the number of babies born (Birth)?
Perform the hypothesis test using EcStat. Then answerthe questions on the next slide.
56
Question 3(continued)
Fill in the following answers:
(a) Ho: ___________________
(b) Write down the regression equation:
______________________________
(c) What is the value of test statistic? (Include symbol z/t)___________________
(d) What is the value of p-val? ________
(e) Do you reject or not reject Ho? _________
(f) What is a 95% CI for β ? ____________________(g) Does the 95% CI for β include the null value? ______
(h) What is the value of goodness-of-fit statistic? _______57
Question 4 (Pract 8 Exercises)
Load the file “Peru.xls” (used in Pract/WASP 8)
Research question: Can the number of years (Years)since migration be used to predict the systolic bloodpressure (Systol)?
Perform the hypothesis test using EcStat. Then answerthe questions on the next slide.
8/3/2019 Tute11_4x1
http://slidepdf.com/reader/full/tute114x1 15/15
58
Question 4 (continued)
Fill in the following answers:
(a) Ho: ___________________
(b) Write down the regression equation:______________________________
(c) What is the value of test statistic? (Include symbol z/t)___________________
(d) What is the value of p-val? ________
(e) Do you reject or not reject Ho? _________
(f) What is a 95% CI for β ? ____________________
(g) Does the 95% CI for β include the null value? ______
(h) What is the value of goodness-of-fit statistic? _______59
Question 5 (Pract 8 Exercises)
Continue with the file “Peru.xls”.
Research question: Can Forearm (X) be used topredict Weight (Y)?
Perform the hypothesis test using EcStat. Then answerthe questions on the next slide.
60
Question 5 (continued)
Fill in the following answers:
(a) Ho: ___________________
(b) Write down the regression equation:
______________________________
(c) What is the value of test statistic? (Include symbol z/t)___________________
(d) What is the value of p-val? ________
(e) Do you reject or not reject Ho? _________
(f) What is a 95% CI forβ
? ____________________(g) Does the 95% CI for β include the null value? ______
(h) What is the value of goodness-of-fit statistic? _______