stat e-100 section 2—monday, september 23, 2013 5:30-6:30pm, sc113

15
STAT E-100 Section 2—Monday, September 23, 2013 5:30-6:30PM, SC113

Upload: alicia-barnett

Post on 16-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: STAT E-100 Section 2—Monday, September 23, 2013 5:30-6:30PM, SC113

STAT E-100Section 2—Monday, September 23, 2013

5:30-6:30PM, SC113

Page 2: STAT E-100 Section 2—Monday, September 23, 2013 5:30-6:30PM, SC113

HousekeepingHomework assignments are due at the

beginning of each class (@ 5:30PM on Tuesday)

Comments on homework

If you have any questions about the homework, email me at [email protected]

Section review can be found on course website under “Sections” tab

Page 3: STAT E-100 Section 2—Monday, September 23, 2013 5:30-6:30PM, SC113

General QuestionsGeneral Questions?

Questions about HW assignments 1 or 2. Questions about SPSS?

Page 4: STAT E-100 Section 2—Monday, September 23, 2013 5:30-6:30PM, SC113

Sample Question 2A survey was given in a section of stat 104

students last fall. It measured the students to report their self-recorded heart rate and asked them the number of hours they exercised per week. Here are the histograms of the 2 variables.

Page 5: STAT E-100 Section 2—Monday, September 23, 2013 5:30-6:30PM, SC113

Sample Question 2 (cont’d)

• The first (exercise) is skewed to the right, with much of the data falling between 0 and 10.

• The second (heartrate) is a bit skewed to the left, with data falling more evenly around both sides of the mean (which is different than the first).

Page 6: STAT E-100 Section 2—Monday, September 23, 2013 5:30-6:30PM, SC113

Sample Question 2 (cont’d)

Below is the scatterplot of y = heart rate vs. x = exercise.

• b) What do you think the correlation coefficient is between these 2 variables?

• Is the coefficient positive or negative? In other words, when there is an increase in hours/wk of exercise, what happens to heartrate?

Page 7: STAT E-100 Section 2—Monday, September 23, 2013 5:30-6:30PM, SC113

Sample Question 2 (cont’d)

Here is some SPSS output to describe the relationship between the variables exercise and heartrate:

Descriptive Statistics

Mean Std. Deviation N

exercise 4.075 3.8535 20

heartrate 69.40 9.838 20

Correlations

exercise heartrate

exercise Pearson Correlation 1 -.346

Sig. (2-tailed) .135

N 20 20

heartrate Pearson Correlation -.346 1

Sig. (2-tailed) .135

N 20 20

Page 8: STAT E-100 Section 2—Monday, September 23, 2013 5:30-6:30PM, SC113

Sample Question 2 (cont’d)

c) What would be the correlation between heartrate and exercise if iexercise were measures in minutes per week instead of hours?

We can convert this in SPSS via the Transform ->Compute Variable menu option (see original SPSS handout for more information). The SPSS output is below:

Descriptive Statistics

Mean Std. Deviation N

exercise 4.075 3.8535 20

heartrate 69.40 9.838 20

Correlations

exercise heartrate

exercise Pearson Correlation 1 -.346

Sig. (2-tailed) .135

N 20 20

heartrate Pearson Correlation -.346 1

Sig. (2-tailed) .135

N 20 20

Page 9: STAT E-100 Section 2—Monday, September 23, 2013 5:30-6:30PM, SC113

Sample Question 2 (cont’d)

d) What is the equation for the best fit line for this data?

Remember, we use y = b0 + b1x format, with b0 being the y-intercept (i.e. what the value of y is when x=0, visually seen as when the line crosses the y axis) and b1 being the slope of the simple linear regression equation.

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig. B Std. Error Beta

1 (Constant) 72.998 3.129 23.330 .000

exercise -.883 .565 -.346 -1.564 .135

a. Dependent Variable: heartrate

Page 10: STAT E-100 Section 2—Monday, September 23, 2013 5:30-6:30PM, SC113

Sample Question 2 (cont’d)

e) What would be the equation for the least squares line between heartrate and exercise if it were measured in minutes per week?

Using the new variable we computed and re-running the regression, we see that:

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig. B Std. Error Beta

1 (Constant) 72.998 3.129 23.330 .000

exercise_min -.015 .009 -.346 -1.564 .135

a. Dependent Variable: heartrate

Page 11: STAT E-100 Section 2—Monday, September 23, 2013 5:30-6:30PM, SC113

Sample Question 2 (cont’d)

f) What is the estimated heartrate for a person who exercise 10 hours per week? How would this change if this person exercised an additional 5 hours per week?

y hat = b0 + b1x

y hat = 72.998hrs + -.883 (10hrs)

y hat = 72.998hrs – 8.83hrs = 64.168hrs is the predicted value of y given our equation when a person exercised 10 hours

y hat = 72.998hrs + -.883 (15hrs)

y hat = 72.998hrs – 13.245hrs = 59.753hrs

Page 12: STAT E-100 Section 2—Monday, September 23, 2013 5:30-6:30PM, SC113

Sample Question 2 (cont’d)

f) What is the estimated heartrate for a person who exercise 10 hours per week? How would this change if this person exercised an additional 5 hours per week?

Page 13: STAT E-100 Section 2—Monday, September 23, 2013 5:30-6:30PM, SC113

“r” versus “r2”Correlation (represented as “r”) and proportion of

variance (represented as “r2”) might be difficult to discern at face value

Similarities:

They use the same letter to represent the two concepts

In simple linear regression (what we’ve been doing with one x value), one can find the correlation (r) between the two variables. When exploring the strength of the relationship between the explanatory (x) and response (y) variables, one can do this by taking the square root of the r2.

Page 14: STAT E-100 Section 2—Monday, September 23, 2013 5:30-6:30PM, SC113

Correlation (r)From Kevin’s lecture, we know that correlation is

the measure of strength of the linear relationship between two variables

In SPSS when using the Analyze->Correlate->Bivariate menu option, the r is the value listed under the Pearson Correlation row that is matched between the two variables (one has a value of 1 when you match the same variable since the data would be the same)

Page 15: STAT E-100 Section 2—Monday, September 23, 2013 5:30-6:30PM, SC113

Proportion of variance (r2)Again, from Kevin (and he does a great job in

explaining variance in lecture #3 towards the end of the Part 1), r2 is the fraction of the total variability of the values of y (irrespective of x) over the variance of observed values from your model.

In other words, r2 it is a way to discern how helpful your regression equation is in explaining the variability in y.