chapter 3 review two variable statistics veronica wright christy treekhem river brooks

16
Chapter 3 Review Two Variable Statistics Veronica Wright Christy Treekhem River Brooks

Upload: abraham-evans

Post on 17-Dec-2015

219 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Chapter 3 Review Two Variable Statistics Veronica Wright Christy Treekhem River Brooks

Chapter 3 ReviewTwo Variable Statistics

Veronica WrightChristy Treekhem

River Brooks

Page 2: Chapter 3 Review Two Variable Statistics Veronica Wright Christy Treekhem River Brooks

The Big Idea• This chapter explains how scatterplots can be used to represent data

in a variety of useful ways. They give good graphical representations of the relationship between the two variables and can be used to easily spot trends such as strength and direction and help to isolate outliers. Residual plots can also be used as a tool for determining how the variables interact. The LSRL, correlation, and correlation coefficient can be used to predict results based on the data and to mathematically prove just how accurate these predictions are.

• We use this all the time in statistics and just about everywhere else. Just looking at a scatterplot a person already uses a number of these principles in order to infer information from it. The most obvious piece of information being how things will develop based on all of the data that has been collected so far. Almost every field uses this. Economics, politics, manufacturing, and even sports.

Page 3: Chapter 3 Review Two Variable Statistics Veronica Wright Christy Treekhem River Brooks

Important Vocabulary• Direction – the overall direction that data moves towards when displayed

on a scatterplot• Scatterplot – a graph that shows the relationship between 2 quantitative

variables that are measured on the same individuals• Response variable – a variable that measures the outcome of a study, i.e.

dependent variable• Explanatory variable – a variable that influences the response variable,

i.e. independent variable• Form – the shape that the data resembles when displayed on a scatterplot

ex. curved, linear, exponential, etc.• Strength – how closely the data points follow the form• Outlier – a data point that doesn’t follow the form as closely as all the

others, a data point that seems significantly out of place on a scatterplot• Correlation coefficient – a measure of the direction and strength of the

linear relationship between two quantitative variables, usually represented as r

Page 4: Chapter 3 Review Two Variable Statistics Veronica Wright Christy Treekhem River Brooks

Important Vocabulary• Regression line – a line that describes how the response variable changes

when the explanatory variable changes• Extrapolation – using the regression line to predict results beyond the

scope of the actual data• LSRL, the least-squares regression line – a line that has the smallest

possible total distance from the data points: ^y = a + bx• Residual – the difference between an actual data point and where the

regression line says that particular data point should fall• Residual plot – a scatterplot of the data’s residuals against its explanatory

variables• Coefficient of determination – the amount of variability in the data that is

accounted for by the LSRL, the higher the coefficient, the more accurately the LSRL represents the data. It is usually shown as r^2 and never greater than 1

• Lurking variable – a variable other than thee response and explanatory variables that may influence the relationship between them

Page 5: Chapter 3 Review Two Variable Statistics Veronica Wright Christy Treekhem River Brooks

Key Topics Covered in the Chapter

• How to graph and determine the relationship between independent (explanatory) and dependent (response) variables

• Correlation– how to find it, and what it means

• Regression line (Best fit, LSRL) – how to find it, what it means, and how well it fits the data

Page 6: Chapter 3 Review Two Variable Statistics Veronica Wright Christy Treekhem River Brooks

Formulas You Ought to Know

• The regression line formula (LSRL):– ŷ = a + bx

• With ŷ being the predicted response, a being the y-intercept, b being the slope, and x being the explanatory variable.

• The formula for the mean:– (a1 + a2 + a3,+......+ an)/n

• The formula for standard deviation:

• The formula for r (correlation coefficient)

Page 7: Chapter 3 Review Two Variable Statistics Veronica Wright Christy Treekhem River Brooks

Calculator Key Strokes• In this unit, on our calculator we are forced to find the Sx,

Sy, mean of x, mean of y, r, r2, and LSRL, as well as graphing the scatterplot and residual plot.

• To find the r^2, r, and LSRL, do the following:

(enter data sets into L1 and L2) Insert your lists in the order, (Explanatory List, Response List)

• To find Sx, Sy, the mean of x, or the mean of y, do all of the above, except press “2” instead of “8”

(enter data sets into L1 and L2) Insert your lists in the order, (Explanatory List, Response List)

Page 8: Chapter 3 Review Two Variable Statistics Veronica Wright Christy Treekhem River Brooks

Calculator Key Strokes• To plot the scatterplot, do the following:

• Enter Data set Make sure plot is On• Choose the Scatterplot

To find the Residual plot, do all of the above, except change “Ylist” to “Resid”

(If you cannot find the RESID button in your Statlist, do the following):And now it should work, but MAKE

SURE

• that you have already calculated the LSRL.

• (Scroll down to DiagnosticOn)

Page 9: Chapter 3 Review Two Variable Statistics Veronica Wright Christy Treekhem River Brooks

Example Problems

• A study shows that there is a positive correlation between the size of the hospital and the median number of days patients remain in the hospital. Does this mean you can shorten a stay by choosing a small hospital? Explain.

• No, correlation is not causation. Also, the patients with minor injuries may not feel the need to go to a larger hospital, thus shortening the stay.

Page 10: Chapter 3 Review Two Variable Statistics Veronica Wright Christy Treekhem River Brooks

Example Problems

• The Standard and Poor 500 index is an average of the price of 500 stocks. There is a moderately strong correlation (r equals approximately 0.6) between how much this index changes in January and how much it changes during the entire year. If we looked instead at data on all 500 individual stocks, we would find a very different correlation. Would the correlation be higher or lower? Why?

• The correlation would be lower; the individual stock performances will be more variable, weakening the relationship.

Page 11: Chapter 3 Review Two Variable Statistics Veronica Wright Christy Treekhem River Brooks

Example Problems

• A study of elementary school children ages 6-11 finds a high positive correlation between shoe size x and score y on a test of reading comprehension. What explains this correlation?

• Age is a lurking variable. We would expect both quantities to increase with age.

Page 12: Chapter 3 Review Two Variable Statistics Veronica Wright Christy Treekhem River Brooks

Example Problems

• A college newspaper interviews a psychologist about student ratings of the teaching of faculty members. The psychologist says, “The evidence indicates that the correlation between research productivity and teaching rating of faculty members is close to zero.” The paper reports this as “Professor McDaniel said that good researchers tend to be poor teachers, and vice versa.” Explain why this is wrong, and explain the psychologist's meaning.

• Professor McDaniel did not say that good researchers make poor teachers; he simply said that there is a low correlation between research productivity and teaching rating.

Page 13: Chapter 3 Review Two Variable Statistics Veronica Wright Christy Treekhem River Brooks

Example Problems

• Explain why this is wrong: “There is a high correlation between gender of American workers and their income.”

• Gender is categorical, not quantitative.

Page 14: Chapter 3 Review Two Variable Statistics Veronica Wright Christy Treekhem River Brooks

Example Problems

• Explain the error: “We found a high correlation (r=1.09) between students' ratings of teaching and ratings made by other faculty members.”

• r must be between 0 and 1.

Page 15: Chapter 3 Review Two Variable Statistics Veronica Wright Christy Treekhem River Brooks

Helpful Hints

• Some people can’t find the RESID button to get the residual plot plot/get an error: That’s because you need to find the LSRL first before that is even possible.

• If you can’t see anything when you plot your scatterplot, press Zoom -> 9.

• If the RESID plot has any type of pattern, you don’t want an LSRL. A different model – perhaps a power or exponential one, if it is curved – would suit the data better.

Page 16: Chapter 3 Review Two Variable Statistics Veronica Wright Christy Treekhem River Brooks

The End

• Click to add text