Download - SESSION 49 - 52
SESSION 49 - 52
Last Update17th June 2011
Regression
Lecturer: Florian BoehlandtUniversity: University of Stellenbosch Business SchoolDomain: http://www.hedge-fund-analysis.net/pages/ve
ga.php
Learning Objectives
1. XY-Scatter Diagrams2. Plotting the Regression Line3. Coefficient Estimates4. Pearson Coefficient of Correlation5. Spearman Rank Correlation Coefficient
XY-Scatter Diagram
To draw a scatter diagram we need data for two variables. In applications where one variable depends to some degree on the other variable, the dependent variable is labeled Y and the other, called the independent variable, X. The values for X and Y are combined into a single data point using the observations for X and Y as coordinates.
Example Temperature - Truck
5 10 15 20 25 30 35 4002468
101214161820
XY-Scatter
Temp: x
Truc
ks: y
Temp TrucksObs x y
1 11 2.52 14 6.53 20 8.54 21 10.55 23 116 24 127 26 138 28 13.59 30 15.5
10 34 19
Regression Analysis
Regression analysis is used to predict the value of one variable on the basis of the other variables. The first-order linear model describes the relationship between the dependent variable Y and the independent variable(s) X. The regression model with a as the y-intercept and m as the slope coefficient is of the form:
Example Temperature - Truck
Temp TrucksObs x y
1 11 2.52 14 6.53 20 8.54 21 10.55 23 116 24 127 26 138 28 13.59 30 15.5
10 34 19
5 10 15 20 25 30 35 4002468
101214161820
f(x) = 0.654323775118537 x − 3.91487920523821
XY-Scatter
Temp: x
Truc
ks: y
The estimators of the intercept a and slope coefficient b are based on drawing a straight line through the sample data:
Intercept and Slope
The intercept a is the y-coordinate of the point where the linear function intersects the y-axis. The slope coefficient b is defined as the change in y for a unit change in x.
Fitted Line With Residuals
The line drawn through the point is called the regression line.
Residuals Squared
The regression or least square line represents a line that minimizes the sum of the squared differences between the points and the line.
Calculating Coefficients
Raw Data (y-variable as dependent and x as independent variable):
Temp TrucksObs x y
1 11 2.52 14 6.53 20 8.54 21 10.55 23 116 24 127 26 138 28 13.59 30 15.5
10 34 19
SolutionTemp Trucks
Obs x y xy x^21 11 2.5 27.5 1212 14 6.5 91 1963 20 8.5 170 4004 21 10.5 220.5 4415 23 11 253 5296 24 12 288 5767 26 13 338 6768 28 13.5 378 7849 30 15.5 465 900
10 34 19 646 1156Total 231 112 2877 5779
Step1: Calculate the gradient (beta):
SolutionTemp Trucks
Obs x y xy x^21 11 2.5 27.5 1212 14 6.5 91 1963 20 8.5 170 4004 21 10.5 220.5 4415 23 11 253 5296 24 12 288 5767 26 13 338 6768 28 13.5 378 7849 30 15.5 465 900
10 34 19 646 1156Total 231 112 2877 5779
Step 2: Calculate the intercept (alpha):
Interpreting the Coefficients
The slope coefficient b may be interpreted as the change in the dependent variable y for a one unit change in x. In the previous example, a one unit change in temperature results in a b = 0.654 additional truckloads of cool drinks sold.The intercept a is the point at which the regression line and the y-axis intersect. If x = 0 lies far outside the range of sample values x, the interpretation of the intercept is not straight-forward. In the temperature-truck example, x = 0 lies outside the smallest and largest values for x in the sample. Interpreting the intercept for x would imply that at temperature of x = 0, the soft-drink sales decline to negative 3.914!
Point Prediction
Upon obtaining the coefficient estimates we can predict the outcome for various x (point prediction) between the minimum and maximum sample observation using the regression function y = a + mx. For example:x = 16 degrees? y = 3.914 + 0.654*16 y = 6.554 ≈ 7 truckloads
X = 32 degrees? y = 3.914 + 0.654*32 y = 17.023 ≈ 17 truckloads
Pearson Coefficient of Correlation
The Pearson coefficient of correlation R may be used to test for linear association between variables. The coefficient is useful to determine whether or not a linear relationship exists between y and x. Note that variables may be positively or negatively correlated. R = 1 denotes perfect positive correlation, R = -1 signifies perfect negative correlation. R is defined for:
Type of Relationship
DIRECT LINEAR RELATIONSHIP
Small Dispersion Wide Dispersion
INVERSE LINEAR RELATIONSHIP
Small Dispersion Wide DispersionNO LINEAR
RELATIONSHIP
y
x
y
x
y
x
y
x
y
x
y
x
y
x
y
x
y
x
y
x
Positive Linear Correlation exists
0 < r <+ 1
Negative Linear Correlation exists
-1 < r < 0
No Correlation
r = 0
Coefficient of Determination
Squaring the Pearson coefficient of correlation delivers the coefficient of determination R2 in regression. It may be interpreted as the proportion of variation in the dependent variable y that is explained by the variation in the explanatory variable x. R2 is a measure of strength of the linear relationship between y and x.
Solution
Step 3: Calculate R and R2
Temp TrucksObs x y xy x^2 y^2
1 11 2.5 27.5 121 6.252 14 6.5 91 196 42.253 20 8.5 170 400 72.254 21 10.5 220.5 441 110.255 23 11 253 529 1216 24 12 288 576 1447 26 13 338 676 1698 28 13.5 378 784 182.259 30 15.5 465 900 240.25
10 34 19 646 1156 361Total 231 112 2877 5779 1448.5
Spearman Rank Correlation
The standard coefficient of correlation allows for determining whether there is evidence of a linear relationship between two interval variables. In case where the variables are ordinal, or, if both variables are interval, the normality requirement may not be satisfied. A nonparametric test statistic called Spearman Rank Correlation Coefficient may be used under the circumstances.
Objective: Comparing 2 Variables
Nominal
Chi-Square test of a contingency table
Nominal
Analyzing the relationship between two variables
Ordinal
Data type?
Spearman Rank Correlation
Population Distribution?
Error is normal or x and y bivariate
normal
x and y not bivariate normal
Simple linear regression
Example
Ranking
Business AspectManagement Staff
Brand Equity 1 1Financial Controls 2 3Customer Service 3 2Planning Systems 4 6Research & Development 5 4Company Morale 6 7Productivity 7 5
Below there is a list of organizational strengths that were independently ranked by management and staff and the managing director wished to know how closely correlated were the assessments:
Calculating RS
Ranking
Business Aspect ObsManage
ment Staff d d^2Brand Equity 1 1 1 0 0Financial Controls 2 2 3 -1 1Customer Service 3 3 2 1 1Planning Systems 4 4 6 -2 4Research & Development 5 5 4 1 1Company Morale 6 6 7 -1 1Productivity 7 7 5 2 4Total 12