slide 2 chapter 4 descriptive methods in regression and correlation

21

Upload: godwin-dorsey

Post on 25-Dec-2015

224 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Slide 2 Chapter 4 Descriptive Methods in Regression and Correlation
Page 2: Slide 2 Chapter 4 Descriptive Methods in Regression and Correlation

Slide 2

Chapter 4

Descriptive Methods in

Regression and Correlation

Page 3: Slide 2 Chapter 4 Descriptive Methods in Regression and Correlation

Slide 3

C4S1 – Linear Equation with One Independent Variable

Linear equations with one independent variable can be written as y = b0 + b1x

b0 and b1 are constants (fixed numbers) and x is the independent variable and y is the dependent variable.

The graph of a linear equation is a straight line. y = mx + b

Linear Equations

Page 4: Slide 2 Chapter 4 Descriptive Methods in Regression and Correlation

Slide 4

int (0, y)

int (x, 0)

y mx b

y

x

Page 5: Slide 2 Chapter 4 Descriptive Methods in Regression and Correlation

Slide 5

Figure 4.6

Positive SlopeFalls right to left

Negative SlopeFalls left to right

Horizontal Line Has a slope of 0

Page 6: Slide 2 Chapter 4 Descriptive Methods in Regression and Correlation

Slide 6

Plotting the data in a scatterplot helps us visualize any apparent relationship between x and y. Generally speaking, a scatterplot (or scatter diagram) is a graph of data from two quantitative variables of a population.  To construct a scatterplot, we use a horizontal axis for the observations of one variable and a vertical axis for the observations of the other. Each pair of observations is then plotted as a point.

C4S2 – The Regression Equation

Page 7: Slide 2 Chapter 4 Descriptive Methods in Regression and Correlation

Slide 7

Because we could draw many different lines through the cluster of data points, we need a method to choose the “best” line.

The method, called the least-squares criterion, is based on an analysis of the errors made in using a line to fit the data points.

0 1y b b x

Page 8: Slide 2 Chapter 4 Descriptive Methods in Regression and Correlation

Slide 8

To avoid confusion, we use to denote the y-value predicted for a value of x.

To measure quantitatively how well a line fits the data, we first consider the errors, e, made in using the line to predict the y-values of the data points.

In general, an error, e, is the signed vertical distance from the line to a data point. The error made in using the line to predict the y-value is e = y −

The decide which line best fits the data we compute the sum of the squared errors

The line with the smaller sum of squared error is the one that fits the data better.

y

y

2ie

Page 9: Slide 2 Chapter 4 Descriptive Methods in Regression and Correlation

Slide 9

Page 10: Slide 2 Chapter 4 Descriptive Methods in Regression and Correlation

Slide 10

Regression Equation for a set of n data points is

2

0 22

1 22

y-intcept

slope

y x x xyb

n x x

n xy x yb

n x x

0 1y b b x

yy

n

Mean for y

Page 11: Slide 2 Chapter 4 Descriptive Methods in Regression and Correlation

Slide 11

ExtrapolationSuppose that a scatterplot indicates a linear relationship between two variables.

Then, within the range of the observed values of the predictor variable, we can reasonably use the regression equation to make predictions for the response variable.

However, to do so outside that range, which is called extrapolation, may not be reasonable because the linear relationship between the predictor and response variables may not hold there.

Grossly incorrect predictions can result from extrapolation.

Page 12: Slide 2 Chapter 4 Descriptive Methods in Regression and Correlation

Slide 12

Outliers and Influential ObservationsAn outlier is an observation that lies outside the overall pattern of the data.

In the context of regression, an outlier is a data point that lies far from the regression line, relative to the other data points.

An outlier can sometimes have a significant effect on a regression analysis.

We must also watch for influential observations.

In regression analysis, an influential observation is a data point whose removal causes the regression equation (and line) to change considerably.

A data point separated in the x-direction from the other data points is often an influential observation because the regression line is “pulled” toward such a data point without counteraction by other data points.

Page 13: Slide 2 Chapter 4 Descriptive Methods in Regression and Correlation

Slide 13

Regression analysis is used when you want to show if and/or how one variable can predict or cause changes in another variable.

Correlation between x and y Sx and Sy are the standard deviations of x and y

Slope of best fit line y

x

sm r

s

Page 14: Slide 2 Chapter 4 Descriptive Methods in Regression and Correlation

Slide 14

C4S3 – The Coefficient of Determination

Page 15: Slide 2 Chapter 4 Descriptive Methods in Regression and Correlation

Slide 15

The coefficient of determination, r2, always lies between 0 and 1.

r2 near 0 suggests that the regression equation is not very useful for making predictions

r2 near 1 suggest that the regression equation is quite useful for making predictions

Shows us if we can use the regression equation instead of the mean.

Percentage of variation.

Page 16: Slide 2 Chapter 4 Descriptive Methods in Regression and Correlation

Slide 16

Regression Identity

The total of the squares equals the regression sum of squares plus the error sum of squares.

SST = SSR + SSE

Equation is always true

Page 17: Slide 2 Chapter 4 Descriptive Methods in Regression and Correlation

Slide 17

C4S4 – Linear CorrelationWe here things like “there is a positive correlation between x and y” and “x and y are uncorrelated” these are explained in this section.

Linear Correlations measures the strength of the linear relationship between two variables.

Used for hand calculations

Reveals the meaning and basic properties

Page 18: Slide 2 Chapter 4 Descriptive Methods in Regression and Correlation

Slide 18

Understanding the Linear Correlation Coefficient

r is the independent of the of the choice of units and always lies between -1 and 1.

Close to ±1 then there is a strong linear relationship and is useful in making predictions. Regression equation is extremely useful. The data points are clustered closely about the regression line.

Near 0 then the linear relationship is weak and a poor predictor. The data points are essentially scattered about a horizontal line.

Keep in mind that r measures the strength of the linear relationship between two variables and that the following properties of r are meaningful only when the data points are scattered about a line.

• r reflects the slope of the scatterplot.• The magnitude of r indicates the strength of the linear

relationship.• The sign of r suggests the type of linear relationship.• The sign of r and the sign of the slope of the regression line are

identical.

Page 19: Slide 2 Chapter 4 Descriptive Methods in Regression and Correlation

Slide 19Figure 4.17

Understanding the Linear Correlation Coefficient

To graphically portray the meaning of the linear correlation coefficient, we present various degrees of linear correlation in Fig. 4.17.

Page 20: Slide 2 Chapter 4 Descriptive Methods in Regression and Correlation

Slide 20

Relationship Between the Correlation Coefficient and the Coefficient of Determination

The coefficient of determination, r2, is a descriptive measure of the utility of the regression equation for making predictions.

The coefficient of determination, r2, equals the square of the linear correlation coefficient, r.

Linear correlation coefficient, r, is a descriptive measure of the strength of the linear relationship between two variables.

Because linear correlation coefficient describes the strength of the linear relationship between two variables it should be used as a descriptive measure only when a scatterpoint indicates that the data points are scattered about the line.

Page 21: Slide 2 Chapter 4 Descriptive Methods in Regression and Correlation

Slide 21

Relationship Between the Correlation Coefficient and the Coefficient of Determination

When using linear correlation coefficient you must also watch for outliers and influential observation because sample means and sample standard deviations are not resistant to outliers and other extreme values.

We cannot say the a value of r near 0 implies there is no relationship and we cannot say that values of r near ± 1 implies that a linear relationship exists. Only meaningful when the scatterplot indicate that the data points are scattered about a line.