british biometrician sir francis galton was the one who used the term regression in the later part...

49
REGRESSION ANALYSIS British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century.

Upload: duane-taylor

Post on 24-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

REGRESSION ANALYSIS

British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century.

Page 2: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Meaning of Regression Analysis

Regression means the estimation or prediction of the unknown value of dependent variable from the known value of independent variable. In other words, regression analysis is a mathematical measure of the average relationship between two or more variables.

Definition of RegressionBy M.M. Blair- Regression is the measure of the average

relationship between two or more variables.

By Hirsch- regression analysis measures the nature and extent of the relation between two or more variables, thus enables us to make predictions.

Page 3: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Utility of Regression 1. Regression analysis is essential for planning and

policy making: It is widely used to estimate the coefficient of economic relationship. These coefficient helps in the formulation of economic policy of govt.

2. Testing Economic Theory: Regression analysis is one of the basic tool to test the accuracy of economic theory.

3. Prediction: By regression analysis, the value of dependent variable can be predicted on the basis of the value of an independent variable. For example, if price of a commodity rises, what will be the probable fall in demand, this can be predicted by regression.

4. Regression is used to study the functional relationship:

In agriculture, it is important to study the response of fertilizer.

Page 4: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Difference between Regression and CorrelationBasis Correlation Regression

1. Degree and nature of Relationship

Correlation is a measure of degree of relationship between X and Y.

Regression studies the relationship between two variables so that one may be able to predict the value of one variable on the basis of another.

2. Cause and Effect Relationship

Correlation does not always assume cause and effect relationship.

Regression analysis expresses the cause and effect relationship.

3. Prediction Correlation does not make any prediction.

Regression analysis enable us to make prediction.

4. Non-sense Correlation:

In Correlation analysis, sometimes there is a non-sense correlation like between rise in income and rise in weight

In regression analysis there is nothing like non-sense regression.

Page 5: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Types of Regression Analysis1. Simple and Multiple Regression: In simple regression

analysis, we study only two variables at a time, in which one variable is dependent and another is independent. The relationship between income and expenditure is an example of simple regression. In multiple regression analysis, we study more than two variables at a time in which one is dependent variable and others are independent variable. The study of effect of rain and irrigation on yield of wheat is an example of multiple regression.

2. Linear and Non-linear Regression: When one variable changes with other variable in fixed ratio this is called linear regression. When one variable varies with other variable in changing ratio, then it is called as non-linear regression.

3. Partial and Total Regression: When two or more variables are studied for functional relationship but at a time, relationship between two variables is studied and the other variables are constant, it is known as partial regression. When all the variables are studied simultaneously for the relationship among them is called total regression.

Page 6: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Simple Linear Regression

Regression Lines

Regression Equations

Regression Coefficients

Page 7: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Regression Lines

The Regression Line shows the average relationship between two variables. This is also known as the line of Best Fit. If two variables X and Y are given, then there are two regression lines related to them:

1. Regression Line of X on Y: The regression line of X on Y gives the best estimate for the value of X for any given value of Y

2. Regression Line of Y on X: The regression line of Y on X gives the best estimate for the value of Y for any value of X.

Page 8: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Method of obtaining Regression lines

1. Scatter diagram method

2. Least Square method

Page 9: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Scatter diagram method

This is the simplest method of constructing regression lines. In this method, values of related variables are plotted on a graph. A straight line is drawn passing through the plotted points. The straight line is drawn with freehand. The shape of regression line can be linear or non-linear.

Page 10: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Least Square Method Regression Lines can also be constructed by this method.

Under this method a regression line is fitted through different points in such a way that the sum of squares of the deviations of the observed values from the fitted line shall be least. The line drawn by this method is called Line of Best Fit. Under this method two lines regression lines are drawn in such a way that sum of squared deviations becomes minimum.

Page 11: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Regression Equations

Regression equations are the algebraic formulation of regression lines. Regression Equations represent regression lines. There are two regression equations:

Regression Equation of Y on X

Regression equation of X on Y

Page 12: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Regression Equation of Y on X

This equation is used to estimate the probable value of Y on the basis of the given values of X. This equation can be expressed in the following way:

Y=a+bX Here a and b are constants Regression equation of Y on X can also be

presented in another way: Y−Y¯=byx(X—X¯)

Page 13: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Regression Equation of X on Y

This equation is used to estimate the probable values of X on the basis of the given values of Y. This equation is expressed in the following:

X=a+bY Here a and b are constants Regression equation of X on Y can also be written

in the following way: X—X¯=bxy(Y—Y¯¯)

Page 14: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Regression Coefficients There are two regression coefficients. Regression

coefficients measures the average change in the value of one variable for q unit change in the value of another variables. Regression coefficient represents the slope of a regression line. There are two regression coefficients:

Regression coefficient of Y on X

Regression coefficient of X on Y

Page 15: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Regression coefficient of Y on X

This coefficient shows that with a unit change in the value of X variable, what will be the average change in the value of Y variable. This is represented byx. Its formula is as follows:

byx=r.y∕x

Regression coefficient of X on Y This coefficient shows that with a unit change in the value of

Y variable, what will be the average change in the value of X-variable. It is represented by bxy. Its formula is as follows:

bxy=r.x∕y

Page 16: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Properties of Regression Coefficients

Following are the properties of regression coefficients: 1. Coefficient of correlation is the geometric

mean of the regression coefficients. r=√bxy×byx 2. Both the regression coefficients must have

same sign- This means either both regression coefficients will either be positive or negative. In other words when one regression coefficient is negative, the other would also be negative. It is never possible that one regression coefficient is negative while the other is postive.

Page 17: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

3. The coefficient of correlation will have the same sign as that of regression coefficients- If both regression coefficients are negative, then the correlation would be negative. And if byx and bxy have positive signs, then r will also take plus sign.

4. Both regression coefficients cannot be greater than unity- If one regression coefficient of y on x is greater than unity, then the regression coefficient of x on y must be less than unity. This is because

r=√byx.bxy=±1

Page 18: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

5. Arithmetic mean of two regression coefficients is either equal to or greater than the correlation coefficient. In terms of the formula:

byx+bxy÷2≥r6. Shift of origin does not affect regression

coefficients but shift in scale does affect regression coefficient- Regression coefficients are independent of the change of origin but not of scale.

Page 19: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Methods to obtain Regression Equations

Regression Equations in case of Individual Series

Regression Equations in case of Grouped Data

Page 20: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

In case of Individual Series

In Individual series, regression equations can be worked out by two methods:

Using Normal Equations

Using regression Coefficients

Page 21: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Regression Equations using Normal Equations

This method is also called Least Square Method. Under this method, computation of regression equations is done by solving two normal equations.

Regression Equation of Y on X Y=a+bX Under least square method, the values of a and b are obtained by using the

following two normal equations: ∑Y=Na+b∑X ∑XY=a∑X+b∑X² Solving these equations, we get the following value of a and b: bxy= N.∑XY−∑X.∑Y÷N.∑X²−(∑X)² a=Y-bX Finally a and b are put in the equation: Y=a+bX

Page 22: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Regression Equation of X on Y

Regression Equation of X on Y is expressed as follows: X=a+bY Under least Square Method the value of a and b are obtained

by using the following normal equations: ∑X=Na+b∑Y ∑XY=a∑Y+b∑Y² Solving these equations we get the value of a and

b The calculated value of a and b are put in the equation:

X=a+bY

Page 23: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Example

Calculate the regression equation of X on Y from the following by least square method

X Y X² Y² XY

1 2 1 4 2

2 5 4 25 10

3 3 9 9 9

4 8 16 64 32

5 7 25 49 35

N=5 ∑X=15

∑Y=25 ∑X²=55 ∑Y²=151 ∑XY=88

Page 24: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Solution

Regression Equation of X on Y X=a+bY The two normal equations are: ∑X=Na+b∑Y ∑XY=a∑Y+b∑Y² Substituting the values we get 15=5a+25b 88=25a+151b Multiplying 1 and 2 88=25a+151b 75=25a+125b − − − 13=26b b=13÷26=0.5

Page 25: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Putting the value of b in equation1 15=5a+25×0.5 15=5a+12.5 5a=2.5 a=0.50 Therefore: X=0.5+0.5Y

Page 26: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Regression Equation using Regression Coefficients

Regression equations can also be computed with the help of regression coefficients. For this we will have to find out, X‾, Y‾, byx, bxy from the data. Regression equations can be computed from the regression coefficients by following method:

1. Using actual values of X and Y 2. Using deviations from Actual Means 3. Using deviations from Assumed Means 4. Using r, x, y, and X‾, Y‾.

Page 27: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Using Actual values of X and Y series

In this method , actual values of X and Y are used to determine regression equations. Regression equation are put in the following way:

Regression Equation of Y on X Y-Y‾=byx(X-X‾) Using the actual values, the value of byx can be calculated

as: byx=N.∑XY−∑X.∑Y÷N.∑X²−(∑X)² Regression Equation of X on Y X−X‾=bxy(Y−Y‾) Using the actual values bxy can be calculated

as : bxy=N.∑XY−∑X.∑Y÷N.∑Y²−(Y)²

Page 28: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Example

Obtain two lines of equations from the following :

X Y XY X² Y²

2 5 10 4 25

4 7 28 16 49

6 9 54 36 81

8 8 64 64 64

10 11 110 100 121

∑X=30 ∑Y=40 ∑XY=266 ∑X²=220 ∑Y²=340

Page 29: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Regression coefficient of Y on X byx=N.∑XY−(∑X) (∑Y)÷N.∑X²−(∑X)² =5×266−30×40÷5×220−(30)² =1330−1200÷1100−900 =130÷200 =0.65 Regression coefficient of X on Y bxy=N.∑XY−∑X.∑Y÷N.∑Y²−(Y)² =5×266−30×40÷5×340−(40)² =1330−1200÷1700−1600 =130÷100 =1.30 Y−Y‾=byx(X−X‾) Y‾=∑Y÷N =40÷5 =8 X‾=∑X÷N =30÷5 =6

Page 30: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Regression equation of Y on X Y−Y‾=byx(X−X‾) Y−8=0.65(X− 6) Y=4.1+0.65X Regression equation of X on Y X−X‾=bxy(Y−Y‾) X−6=1.30(Y−8) X=−4.40+1.30Y

Page 31: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Using Deviations take from Actual Means

When the size of the values of X and Y is very large, then the method using actual values becomes very difficult to use. In such case, deviations taken from arithmetic means are used.

Regression equation of Y on X Y−Y‾=byx(X−X‾) Using deviations from actual means: byx=∑xy÷∑x² Where, x=X−X‾, y=Y−Y‾ Regression equation of X on Y X−X‾=bxy(Y−Y‾) Using deviations from actual means: bxy=∑xy÷∑y²

Page 32: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Example

Obtain the two regression equations from the following:

X Y X‾=7X−X‾x

x² Y‾=5Y−Y‾y

Y² xy

2 4 −5 25 −1 1 5

4 2 −3 9 −3 9 9

6 5 −1 1 0 0 0

8 10 1 1 5 25 5

10 3 3 9 −2 4 −6

12 6 5 25 1 1 5

∑X=42N=6

∑Y=30 ∑x=0 ∑x²=70

∑y=0 ∑y²=40

∑xy=18

Page 33: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

SolutionSince the actual means of X and Y are whole numbers we should take

deviations from X‾and Y‾ byx=∑xy÷∑x² =18÷70 =0.257 bxy=∑xy÷∑y² =18÷40 =0.45 Regression equation of Y on X Y−Y‾=byx(X−X‾) Y−5=0.257(X−7) Y−5=0.257X−7 Y=0.257X+3.201 Regression equation of X on Y X−X‾=bxy(Y−Y‾) X−7=0.45(Y−5) X−7=0.45Y−2.25 X=0.45Y=4.75

Page 34: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Using Deviations taken from Assumed Means

When actual means turn out to be in fractions rather than the whole number like 24.14, 56.89 etc, then deviations from assumed means rather than actual means are used. Following are the regression equations:

Regression equation of Y on X Y−Y‾=byx(X−X‾) Using deviations from the assumed means, value of byx

can be calculated as: byx=N×∑dxdy−∑dx.∑dy÷N.∑dx²−(∑dx)²

Page 35: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Regression equation of X on Y X−X¯=bxy(Y−Y¯) Using deviations from assumed means the value of bxy

can be calculated: bxy=N.∑dxdy−∑dx.∑dy÷N.∑dy²−(∑dy)² Where dx=X−Ax, dy=Y−Ay

Page 36: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Example Obtain the two regression equations for the

following data:

X Y A=69dx

dx² A=112dy

dy² dxdy

78 125 9 81 13 169 117

89 137 20 400 25 625 500

97 156 28 784 44 1936 1232

69=A 112=A 0 0 0 0 0

59 107 −10 100 −5 25 50

79 136 10 100 24 576 240

68 124 −1 1 12 144 −12

61 108 −8 64 −4 16 32

N=8∑X=600

∑Y=1005

∑dx=48

∑dx²=1530

∑dy=109

∑dy²=3491

∑dxdy=2159

Page 37: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Solution

byx=N.∑dxdy−∑dx.∑dy÷N.∑dx²−(dx)² =8×2159−(48)(109)÷8×1530−(48)² =17272−5232÷12240−2304 =12040÷9936 =1.212 Regression equation of Y on X Y−Y¯=byx(X−X¯) Y−125.625=1.212(X−75) Y−125.625=1.212X−90.9 Y=1.212X+34.725

Page 38: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

bxy=N.∑dxdy−∑dx.∑dy÷N.∑dy²−(∑ dy)² =8×2159−(48)

(109)÷8×3491−(1005)² =17272−5232÷27928−1010025 =12040÷−9827097 =−0.001 Regression equation of X on Y X−X¯=bxy(Y−Y¯) X−75=−0.001(Y−125.625) X−75=−0.001Y+0.125 X=−0.001Y+9.375

Page 39: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Obtain Regression Equations from Coefficient of Correlation, Standard Deviations and Arithmetic Means of X and Y

When the values of X¯, Y, x, y and r of X and Y series are given then regression equations are expressed in the following way:

1. Regression Equation Of Y on X Y−Y¯=byx(X−X¯) Where, byx=r.y÷x 2. Regression equation of X on Y X−X¯=bxy(Y−Y¯) Where, bxy=r.x÷y

Page 40: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Example

You are given the following information: Obtain two regression equations:

X Y

Arithmetic Mean: 5 12

Standard deviation: 2.6 3.6

Correlation Coefficient:

r=0.7

Page 41: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Solution

1. Regression Equation Of X on Y: X−X¯=r.x÷y(Y−Y¯) Putting the values in the equation we get: X−5=0.7×2.6÷3.6(Y−12) X−5=0.51(Y−12) X−5=0.51Y−6.12 X=0.51Y−1.12

Page 42: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

2. Regression equation of Y on X Y−Y¯=r.y÷x(X−X¯)Putting the values in the equation, we get Y−12=0.7×3.6÷2.6(X−5) Y−12=0.97(X−5) Y−12=0.97X−4.85 Y=0.97X+7.15

Page 43: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Obtain Regression Equations in case of Grouped Data

For obtaining regression equations from grouped data, first of

all we have to construct a correlation table. Special adjustments must be made while calculating the value of regression coefficients because regression coefficients are independent of change of origin but not of scale. In grouped data the regression coefficients are computed by using the formula: 1.bxy=N×∑fdxdy−∑fdx.∑fdy÷N×∑fdy²−(∑fdy)²×ix÷iy

2.byx=N×∑fdxdy−∑fdx.∑fdy÷N×∑fdx²−(∑fdx)²×iy÷ix

Page 44: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Obtain the Mean values and Correlation Coefficient from the Regression Equations

1. To find the Mean Values from the Regression Equations: Two regression lines intersect each other at mean value(X¯ and Y‾ points.

2. To find the Coefficient of Correlation from two Regression Equations: Correlation coefficient can be worked out from the regression coefficients bxy and byx.

r=√byx.bxy

Page 45: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Example From the following two regression equations, identify which

one is X on Y and which one is Y on X 2X+3Y=42 X+2Y=26

Solution In the absence of any clear cut indication, Let us assume that

equation first to be Y on X and equation second to be X on Y Let equation 1 be regression equation of Y on X 2X+3Y=42 3Y=42−2X Y=42÷3−2X÷3

Page 46: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

From this it follows that byx=Coefficient of X in 1=−2÷3Now equation 2 be regression equation on X on Y

X+2Y=26 X=26−2YFrom this it follows that bxy=coefficient of Y in 2=−2Now, we calculate ‘r’ on the basis of the above values of two

regression coefficients we get: r²=byx.bxy =−2÷3×−2 = 4÷3>1

Page 47: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Here, r²>1 which is impossible as r²≤1. So our assumption is wrong. We now choose equation 1 as regression of X on Y and 2 as regression equation of Y on X:

Assuming the first equation as on X on Y, we have 2X+3Y=42 2X=42−3Y X=42÷2−3Y÷2 From this, it follows that bxy=Coefficient of Y in above equation=−3÷2 Now, assuming the second equation as Y on X, we have, X+2Y=26 2Y=26−X Y=26÷2−1÷2X From this it follows that byx=Coefficient of X in above equation=−1÷2

Page 48: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century

Now, r²=bxy.byx =−3÷2×−1÷2 =3÷4 =0.75 Here, r²<1 which is possible. r² is within

the limit , r²≤1. Hence, it is proved that the first equation

is of X on Y and the second equation is of Y on X

Page 49: British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century