puaf 610 ta

04/24/23 1

PUAF 610 TA

Session 10

04/24/23 2

TODAY

• Ideas about Final Review• Regression Review

Final Review

• Any idea about the final review next week?• Go over lectures• Go over problem sets that related to the

exam• Go over extra exercises• Try to get information from instructors• Email me your preferences

04/24/23 3

04/24/23 4

Regression

• In regression analysis we analyze the relationship between two or more variables.

• The relationship between two or more variables could be linear or non linear.– Simple Linear Regression y, x – Multiple Regression y, x1, x2, x3,…, xk

• If there exist a relationship, how could we use this relationship to forecast future.

04/24/23 5

Regression

• Regression is the attempt to explain the variation in a dependent variable using the variation in independent variables.

• Regression is thus an explanation of causation.

Independent variable (x)

Dep

ende

nt v

aria

ble

Regression

04/24/23 6

Simple Linear Regression


Dep

ende

nt v

aria

ble

(y)

• The output of a regression is a function that predicts the dependent variable based upon values of the independent variables.

• Simple regression fits a straight line to the data.

y’ = b0 + b1X ± є

b0 (y intercept)

B1 = slope= ∆y/ ∆x

є

Regression

04/24/23 7



Dep

ende

nt v

aria

ble

The function will make a prediction for each observed data point.

The observation is denoted by y and the prediction is denoted by y.

Zero

Prediction: y

Observation: y

^

^

For each observation, the variation can be described as:

y = y + ε

Actual = Explained + Error

^

Regression

04/24/23 8

Simple Linear Regression• Simple Linear Regression Model

y = 0 + 1x +

• Simple Linear Regression EquationE(y) = 0 + 1x

• Estimated Simple Linear Regression Equation

y = b0 + b1x^̂


• The simplest relationship between two variables is a linear one:

• y = 0 + 1x• x = independent or explanatory variable (“cause”)• y = dependent or response variable (“effect”) 0 = intercept (value of y when x = 0) 1 = slope (change in y when x increases one

unit)

Interpret the slope

• Y=0.3+2.6x

04/24/23 11

Regression


Dep

ende

nt v

aria

ble

•A least squares regression, or OLS, selects the line with the lowest total sum of squared prediction errors.

•This value is called the Sum of Squares of Error, or SSE.

Regression

04/24/23 12

Calculating SSR


Dep

ende

nt v

aria

ble

The Sum of Squares Regression (SSR) is the sum of the squared differences between the prediction for each observation and the population mean.

Population mean: y

Regression

04/24/23 13

Regression Formulas

The Total Sum of Squares (SST) is equal to SSR + SSE.Mathematically,

SSR = ∑ ( y – y ) (measure of explained variation)

SSE = ∑ ( y – y ) (measure of unexplained variation)

SST = SSR + SSE = ∑ ( y – y ) (measure of total variation in y)

^

^

2

2

Regression

04/24/23 14

The Coefficient of Determination

The proportion of total variation (SST) that is explained by the regression (SSR) is known as the Coefficient of Determination, and is often referred to as R .

R = =

The value of R can range between 0 and 1, and the higher its value the more accurate the regression model is.

SSR SSR SST SSR + SSE

2

2

2

Regression

04/24/23 15

04/24/23 16

Testing for Significance

• To test for a significant regression relationship, we must conduct a hypothesis test to determine whether the value of 1 is zero.

• t Test is commonly used.

04/24/23 17

• Hypotheses H0: 1 = 0

Ha: 1 = 0• Test Statistic

• Rejection Rule: Reject H0 if t < -tor t > twhere t is based on a t distribution

with n - 2 degrees of freedom.

Testing for Significance: t Test

1

1

bsbt

04/24/23 18

Multiple Linear Regression

• More than one independent variable can be used to explain variance in the dependent variable.

• A multiple regression takes the form:

y = A + β X + β X + … + β k Xk + ε

where k is the number of variables, or parameters.

1 1 2 2

Multiple Regression

04/24/23 19

Multiple Regression

) tan, 45(

3.0R

(0.3) (0.4) (0.1) 9.04.06.0ˆ

2

bracketsinerrorsdardsnsobservatio

zxy ttt

04/24/23 20

Regression

• A unit rise in x produces 0.4 of a unit rise in y, with z held constant.

• Interpretation of the t-statistics remains the same, i.e. 0.4-0/0.4=1 (critical value is 2.02), so we fail to reject the null and x is not significant.

• The R-squared statistic indicates 30% of the variance of y is explained.

04/24/23 21

Adjusted R-squared Statistic

• This statistic is used in a multiple regression analysis, because it does not automatically rise when an extra explanatory variable is added.

• Its value depends on the number of explanatory variables.

• It is usually written as (R-bar squared):

2R

04/24/23 22

Adjusted R-squared

• It has the following formula (n-number of observations, k-number of parameters):

)1(1 222 Rkn

kRR

04/24/23 23

F-test of explanatory power

• This is the F-test for the goodness of fit of a regression and in effect tests for the joint significance of the explanatory variables.

• It is based on the R-squared statistic.• It is routinely produced by most computer

software packages• It follows the F-distribution.

04/24/23 24

F-test formula

• The formula for the F-test of the goodness of fit is:

1

2

2

)/()1(1/

kknF

knRkRF

04/24/23 25

F-statistic

• When testing for the significance of the goodness of fit, our null hypothesis is that the explanatory variables jointly equal 0.

• If our F-statistic is below the critical value we fail to reject the null and therefore we say the goodness of fit is not significant.

04/24/23 26

puaf 610 ta

Documents