srit / picm105 sfm / correlation and regression sri … · 2019. 12. 5. · srit / picm105 – sfm...

60
SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 1 SRI RAMAKRISHNA INSTITUTE OF TECHNOLOGY (AN AUTONOMOUS INSTITUTION) COIMBATORE- 641010 PICM105 & STATISTICS FOR MANAGEMENT Unit V CORRELATION AND REGRESSION Correlation If the change in one variable affects the change in the other variable, then the variable are said to be correlated. Positive correlation: If the two variable deviate in the same direction (i.e., increase or decrease) in one variable in a corresponding (increase or decrease) in other variable is said to be positive correlation. Ex: Income and expenditure Negative correlation: If the two variable deviate in opposite direction (i.e., increase or decrease) in one variable in a corresponding (decrease or increase) in other variable is said to be negative correlation. Ex: Price and demand of a product Rank correlation Sometimes there doesn’t exist a marked linear relationship between two random variables but a monotonic relation (if one increases, the other also increases or instead, decreases) is clearly noticed. Pearson’s Correlation Coefficient evaluation, in this case, would give us the strength and direction of the linear association only between the variables of interest. Herein comes the advantage of the Spearman Rank Correlation methods, which will instead, give us the strength and direction of the monotonic relation between the connected variables. This can be a good starting point for further evaluation.

Upload: others

Post on 23-Jan-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 1

SRI RAMAKRISHNA INSTITUTE OF TECHNOLOGY

(AN AUTONOMOUS INSTITUTION)

COIMBATORE- 641010

PICM105 & STATISTICS FOR MANAGEMENT

Unit V

CORRELATION AND REGRESSION

Correlation

If the change in one variable affects the change in the other variable, then the

variable are said to be correlated.

Positive correlation:

If the two variable deviate in the same direction (i.e., increase or decrease) in one

variable in a corresponding (increase or decrease) in other variable is said to be positive

correlation.

Ex: Income and expenditure

Negative correlation:

If the two variable deviate in opposite direction (i.e., increase or decrease) in one

variable in a corresponding (decrease or increase) in other variable is said to be negative

correlation.

Ex: Price and demand of a product

Rank correlation

Sometimes there doesn’t exist a marked linear relationship between two random

variables but a monotonic relation (if one increases, the other also increases or instead,

decreases) is clearly noticed. Pearson’s Correlation Coefficient evaluation, in this case,

would give us the strength and direction of the linear association only between the

variables of interest. Herein comes the advantage of the Spearman Rank Correlation

methods, which will instead, give us the strength and direction of the monotonic relation

between the connected variables. This can be a good starting point for further evaluation.

Page 2: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 2

The Spearman Rank-Order Correlation Coefficient

The Spearman’s Correlation Coefficient, represented by or by , is a

nonparametric measure of the strength and direction of the association that exists between

two ranked variables. It determines the degree to which a relationship is monotonic, i.e.,

whether there is a monotonic component of the association between two continuous or

ordered variables.

Monotonicity is “less restrictive” than that of a linear relationship. Although

monotonicity is not actually a requirement of Spearman’s correlation, it will not be

meaningful to pursue Spearman’s correlation to determine the strength and direction of a

monotonic relationship if we already know the relationship between the two variables is

not monotonic.

Spearman Ranking of the Data

We must rank the data under consideration before proceeding with the Spearman’s

Rank Correlation evaluation. This is necessary because we need to compare whether on

increasing one variable, the other follows a monotonic relation (increases or decreases

regularly) with respect to it or not.

Thus, at every level, we need to compare the values of the two variables. The method of

ranking assigns such ‘levels’ to each value in the dataset so that we can easily compare it.

Assign number 1 to (the number of data points) corresponding to the variable

values in the order highest to lowest.

In the case of two or more values being identical, assign to them the arithmetic mean

of the ranks that they would have otherwise occupied.

For example, Selling Price values given: 28.2, 32.8, 19.4, 22.5, 20.0, 22.5 The

corresponding ranks are: 2, 1, 5, 3.5, 4, 3.5 The highest value 32.8 is given rank 1, 28.2 is

given rank 2,…. Two values are identical (22.5) and in this case, the arithmetic means of

ranks that they would have otherwise occupied (3+42) has to be taken.

Page 3: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 3

Spearman’s Rank Correlation formula

( )

where is the number of data points of the two variables and is the difference in the

ranks of the ith element of each random variable considered. The Spearman correlation

coefficient, , can take values from to .

I. If the value of is indicates a perfect association of ranks

II. If the value of is indicates no association between ranks and

III. If the value of is indicates a perfect negative association of ranks.

IV. If the value of near to zero, the weaker the association between the ranks.

Merits of Rank Correlation Coefficient

1. Spearman’s rank correlation coefficient can be interpreted in the same way as the Karl

Pearson’s correlation coefficient;

2. It is easy to understand and easy to calculate;

3. If we want to see the association between qualitative characteristics, rank correlation

coefficient is the only formula;

4. Rank correlation coefficient is the non-parametric version of the Karl Pearson’s product

moment correlation coefficient; and

5. It does not require the assumption of the normality of the population from which the

sample observations are taken.

Demerits of Rank Correlation Coefficient

1. Product moment correlation coefficient can be calculated for bivariate frequency

distribution but rank correlation coefficient cannot be calculated; and

2. If , this formula is time consuming.

Page 4: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 4

Problem: 1

The following table provides data about the percentage of students who have free

university meals and their CGPA scores. Calculate the Spearman’s Rank Correlation

between the two and interpret the result.

State

University

% of students

having free meals

% of students scoring

above 8.5 CGPA

Pune 14.4 54

Chennai 7.2 64

Delhi 27.5 44

Kanpur 33.8 32

Ahmedabad 38.0 37

Indore 15.9 68

Guwahati 4.9 62

Answer:

Let us first assign the random variables to the required data –

X – % of students having free meals

Y – % of students scoring above 8.5 CGPA

Before proceeding with the calculation, we’ll need to assign ranks to the data

corresponding to each state university. We construct the table for the rank as below –

Rank in X Rank in Y

14.4

7.2

27.5

33.8

38.0

15.9

4.9

∑ ∑

Rank Correlation

( )

(

( )) , -

Page 5: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 5

(

)

The result shows that a strong negative coefficient of correlation. That is the highest

percentage of students consuming free meals tend to have the least successful results.

Problem: 2

Compute the coefficient of rank correlation between sales and advertisement expressed in

thousands of dollars from the following data:

Sales 90 85 68 75 82 80 95 70

Advertisement 7 6 2 3 4 5 8 1

Answer:

Rank in X Rank in Y

90 2 7 2 0

85 3 6 3 0

68 8 2 7 1

75 6 3 6 0

82 4 4 5 1

80 5 5 4 1

95 1 8 1 0

70 7 1 8 1

∑ ∑

Rank Correlation

( )

(

( )) , -

(

)

The result shows that a strong positive coefficient of correlation. Hence there is a very

good amount of agreement between sales and advertisement.

Page 6: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 6

Problem: 3

Find the rank correlation co efficient from the following data.

Rank in X 1 2 3 4 5 6 7

Rank in Y 4 3 1 2 6 5 7

Answer:

∑ ∑

Rank Correlation

( )

(

( )) , -

(

)

Problem: 4

The ranks of some 16 students in mathematics and physics are as follows. Find the

rank correlation for the proficiency in mathematics and physics.

Rank in Math’s 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Rank in Physics 1 10 3 4 5 7 2 6 8 11 15 9 14 12 16 13

Answer:

Page 7: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 7

∑ ∑

Rank Correlation

( )

(

( )) , -

(

)

Problem: 5

Suppose we have ranks of 5 students in three subjects Computer, Physics and Statistics and

we want to test which two subjects have the same trend.

Page 8: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 8

Rank in Computer 2 4 5 1 3

Rank in Physics 5 1 2 3 4

Rank in Statistics 2 3 5 4 1

Answer:

In this problem ranks are directly given.

Rank

in X

Rank

in Y

Rank

in Z

2 5 2

4 1 3

5 2 5

1 3 4

3 4 1

∑ ∑

Rank Correlation

( )

I. Rank correlation between computer and physics:

( )

(

( )) , -

(

)

II. Rank correlation between physics and statistics:

( )

(

( )) , -

(

)

Page 9: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 9

III. Rank correlation between computer and statistics:

( )

(

( )) , -

(

)

Since and are negative which indicates that Computer and Physics also Physics and

Statistics have opposite trends. But indicates that Computer and Statistics have same

trend.

Repeated Ranks

If the value is repeated in any row in any series or series in times, then we

have add the correction factor in the rank correlation formula

( )

(∑

( ))

Problem: 6

Determine the rank correlation co efficient for the following data.

68 64 75 50 64 80 75 40 55 64

62 58 68 45 81 60 68 48 50 70

Answer:

Rank in Rank in

Page 10: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 10

∑ ∑

To find Correction Factor:

In series value 75 is repeated two times

( )

In series value 64 is repeated three times

( )

In series value 68 is repeated two times

( )

Rank Correlation:

(∑

( ))

(

( )) , -

(

)

Problem: 7

The sample of 12 fathers and their eldest sons have the following data about their

heights in inches.

Fathers 65 63 67 64 68 62 70 66 68 67 69 71

Sons 68 66 68 65 69 66 68 65 71 67 68 70

Answer:

Page 11: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 11

Rank in Rank in

∑ ∑

To find Correction Factor:

In series value 68 is repeated two times

( )

In series value 67 is repeated two times

( )

In series value 68 is repeated four times

( )

In series value 66 is repeated two times

( )

In series value 65 is repeated two times

( )

Page 12: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 12

Rank Correlation:

(∑

( ))

(

( )) , -

( )

Karl – pearson’s co efficient of Correlation

Karl Pearson’s Coefficient of Correlation is widely used mathematical method

wherein the numerical expression is used to calculate the degree and direction of the

relationship between linear related variables.

Pearson’s method, popularly known as a Pearsonian Coefficient of Correlation, is the

most extensively used quantitative methods in practice. The coefficient of correlation is

denoted by .

If the relationship between two variables and is to be ascertained, then the

following formula is used:

( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( ) , ( )- ( )

( ) , ( )-

Note:

If ( ) then ( ) ( ) ( )

If ( ) ( ) then and are uncorrelated.

Page 13: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 13

Coefficient of Determination

In Statistical Analysis, the coefficient of determination method is used to predict and

explain the future outcomes of a model. This method is also known as R squared. This

method also acts like a guideline which helps in measuring the model’s accuracy. In this

article, let us discuss the definition, formula, and properties of the coefficient of

determination in detail.

Definition: Coefficient of Determination

The coefficient of determination or squared method is the proportion of the

variance in the dependent variable that is predicted from the independent variable. It

indicates the level of variation in the given data set.

The coefficient of determination is the square of the correlation(r), thus it ranges

from 0 to 1.

With linear regression, the correlation of determination is equal to the square of the

correlation between the x and y variables.

If is equal to 0, then the dependent variable should not be predicted from the

independent variable.

If is equal to 1, then the dependent variable should be predicted from the

independent variable without any error.

If is between 0 and 1, then it indicates the extent that the dependent variable can

be predictable. If of 0.10 means, it is 10 per cent of the variance in variable is

predicted from the variable. If 0.20 means, it is 20 per cent of the variance is

variable is predicted from the variable, and so on.

The value of shows whether the model would be a good fit for the given data set. On the

context of analysis, for any given per cent of the variation, it(good fit) would be different.

For instance, in a few fields like rocket science, R2 is expected to be nearer to 100 %. But

(minimum theoretical value), which might not be true as is always greater than

0 (by Linear Regression).

Page 14: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 14

The value of increases after adding a new variable predictor. Note that it might

not be associated with the result or outcome. The which was adjusted will include the

same information as the original one. The number of predictor variables in the model gets

penalized. When in a multiple linear regression model, new predictors are added, it would

increase . Only an increase in which is greater than the expected(chance alone), will

increase the adjusted .

Properties of Coefficient of Determination

It helps to get the ratio of how a variable which can be predicted from the other one,

varies.

If we want to check how clear it is to make predictions from the data given, we can

determine the same by this measurement.

It helps to find Explained variation / Total Variation

It also lets us know the strength of the association(linear) between the variables.

If the value of gets close to 1, The values of y become close to the regression line

and similarly if it goes close to 0, the values get away from the regression line.

It helps in determining the strength of association between different variables.

Problem: 8

Find the correlation co efficient for the following data

X 10 14 18 22 26 30 Y 18 12 24 6 30 36

Answer:

10 18 180 100 324

14 12 168 196 144

18 24 432 324 576

22 6 132 484 36

26 30 780 676 900

30 36 1080 900 1296

∑ ∑ ∑ ∑ ∑

Page 15: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 15

√∑

( )

( )

√∑

( )

( )

( ) ∑

( ) ( )

Problem: 9

The table below shows the number of absences, , in a Calculus course and the final exam

grade, , for 7 students. Find the correlation coefficient and interpret your result.

X 1 0 2 6 4 3 3 Y 95 90 90 55 70 80 85

Answer:

1 95 1 9025 95

0 90 0 8100 0

2 90 4 8100 180

6 55 36 3025 330

4 70 16 4900 280

3 80 9 6400 240

3 85 9 7225 255

∑ ∑ ∑ ∑ ∑

Page 16: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 16

√∑

( ) √

√∑

( ) √

( ) ∑

( ) ( )

Interpret this result:

There is a strong negative correlation between the number of absences and the final

exam grade, since is very close to . Thus, as the number of absences increases, the final

exam grade tends to decrease.

Page 17: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 17

Problem: 10

The marks obtained by 10 students in Mathematics and Statistics are given below.

Find the correlation coefficient between the two subjects.

Marks in math’s 75 30 60 80 53 35 15 40 38 48

Marks in Stats 85 45 54 91 58 63 35 43 45 44

Answer:

75 85 6375 5625 7225

30 45 1350 900 2025

60 54 3240 3600 2916

80 91 7280 6400 8281

53 58 3074 2809 3364

35 63 2205 1225 3969

15 35 525 225 1225

40 43 1720 1600 1849

38 45 1710 1444 2025

48 44 2112 2304 1936

∑ ∑ ∑ ∑ ∑

√∑

( ) √

√∑

( ) √

( ) ∑

( ) ( )

Page 18: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 18

Problem: 11

Compute the coefficient of correlation between X and Y using the following data:

X 1 3 5 7 8 10

Y 8 12 15 17 18 20

Answer:

1 8 8 1 64

3 12 36 9 144

5 15 75 25 225

7 17 119 49 289

8 18 144 64 324

10 20 200 100 400

∑ ∑ ∑ ∑ ∑

√∑

( )

√∑

( )

( ) ∑

( ) ( )

Page 19: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 19

Problem: 12

Calculate the coefficient of correlation for the following data:

9 8 7 6 5 4 3 2 1

15 16 14 13 11 12 10 8 9

Answer:

9 15 135 81 225

8 16 128 64 256

7 14 98 49 196

6 13 78 36 169

5 11 55 25 121

4 12 48 16 144

3 10 30 9 100

2 8 16 4 64

1 9 9 1 81

∑ ∑ ∑ ∑ ∑

√∑

( )

√∑

( )

( ) ∑

Page 20: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 20

( ) ( )

Regression

Regression analysis is most often used for prediction. The goal in regression analysis

is to create a mathematical model that can be used to predict the values of a dependent

variable based upon the values of an independent variable. In other words, we use the

model to predict the value of when we know the value of . (The dependent variable is

the one to be predicted). Correlation analysis is often used with regression analysis

because correlation analysis is used to measure the strength of association between the

two variables and .

In regression analysis involving one independent variable and one dependent

variable the values are frequently plotted in two dimensions as a scatter plot. The scatter

plot allows us to visually inspect the data prior to running a regression analysis. Often this

step allows us to see if the relationship between the two variables is increasing or

decreasing and gives only a rough idea of the relationship. The simplest relationship

between two variables is a straight-line or linear relationship. Of course the data may well

be curvilinear and in that case we would have to use a different model to describe the

relationship (we will deal only with linear relationship’s for now). Simple linear regression

analysis finds the straight line that best fits the data

Definition:

Regression is mathematical measure of the average relationship between two or

more variables in terms of original limits of the data.

The equation of line of regression of on is

( )

The equation of line of regression of on is

( )

Page 21: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 21

Regression co efficient

Correlation co efficient √

Problem: 13

Obtain the equations of the lines of regression from the following data:

1 2 3 4 5 6 7

9 8 10 12 11 13 14

Answer:

1 9 9 1 81

2 8 16 4 64

3 10 30 9 100

4 12 48 16 144

5 11 55 25 121

6 13 78 36 169

7 14 98 49 196

Total 334 140 875

√∑

( ) √

√∑

( ) √

( ) ( ) ( ) ( )

Correlation coefficient

( )

Page 22: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 22

The line of regression of on is

( )

( )

The line of regression of on is

( )

( )

Problem: 14

From the following data find

The two regression lines

The co efficient of correlation between the marks in economics and statistics

The most likely marks in statistics when marks in economics are 30.

Marks in Economics: 25 28 35 32 31 36 29 38 34 32

Marks in Statistics: 43 46 49 41 36 32 31 30 33 39

Answer:

25 43 1075 625 1849

28 46 1288 784 2116

35 49 1715 1225 2401

32 41 1312 1024 1681

31 36 1116 961 1296

36 32 1152 1296 1024

29 31 899 841 961

38 30 1140 1444 900

34 33 1122 1156 1089

32 39 1248 1024 1521

Total 12067 10380 14838

Page 23: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 23

√∑

( )

√∑

( )

( ) ( ) ( ) ( )

Correlation coefficient

( )

The line of regression of on is

( )

( )

( )

The line of regression of on is

( )

( )

( )

The most likely marks in statistics when marks in economics are 30.

( )

Page 24: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 24

Problem: 15

A tyre manufacturing company is interested in removing pollutants from the exhaust at the

factory, and cost is a concern. The company has collected data from other companies

concerning the amount of money spent on environmental measures and the resulting

amount of dangerous pollutants released (as a percentage of total emissions)

Money spent

(Rupees in lakhs

8.4 10.2 16.5 21.7 9.4 8.3 11.5 18.4 16.7 19.3 28.4 4.7 12.3

Percentage of

dangerous pollutants

35.9 31.8 24.7 25.2 36.8 35.8 33.4 25.4 31.4 27.4 15.8 31.5 28.9

a) Compute the regression equation.

b) Predict the percentage of dangerous pollutants released when Rs. 20,000 is spent on

control measures.

c) Find the standard error of the estimate (regression line).

Answer:

S. No

1 8.4 35.9 70.56 1288.8 301.56

2 10.2 31.8 104.04 1011.2 324.36

3 16.5 24.7 272.25 610.09 407.55

4 21.7 25.2 470.89 635.04 546.84

5 9.4 36.8 88.36 1354.2 345.92

6 8.3 35.8 68.89 1281.6 297.14

7 11.5 33.4 132.25 1115.6 384.1

8 18.4 25.4 338.56 645.16 467.36

9 16.7 31.4 278.89 985.96 524.38

10 19.3 27.4 372.49 750.76 528.82

11 28.4 15.8 806.56 249.64 448.72

12 4.7 31.5 22.09 992.25 148.05

13 12.3 28.9 151.29 835.21 355.47

Total ∑ 185.8 ∑ 384 ∑ 3177 ∑ 11756 ∑ 5080

Page 25: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 25

a) Regression equation

√∑

( )

√∑

( )

( ) ( ) ( ) ( )

Correlation coefficient

( )

The line of regression of on is

( )

( )

( )

The line of regression of on is

( )

( )

( )

b) When Rs. 20,000 ( ) is spent on control then the percentage of

dangerous pollutants released is

( )

Page 26: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 26

c) Standard Error Estimate

S. No ( )

1 8.4 35.9 3.056 3.056

2 10.2 31.8 0.888 0.888

3 16.5 24.7 9.669 9.669

4 21.7 25.2 2.138 2.138

5 9.4 36.8 11.773 11.773

6 8.3 35.8 2.465 2.465

7 11.5 33.4 2.807 2.807

8 18.4 25.4 0.850 0.850

9 16.7 31.4 14.041 14.041

10 19.3 27.4 3.179 3.179

11 28.4 15.8 7.246 7.246

12 4.7 31.5 30.790 30.790

13 12.3 28.9 4.832 4.832

∑( )

√∑( )

Problem: 16

The quantity of a raw material purchased by a company at the specified prices during the

12 months of 1992 is given

MONTH PRICE/KG QUANTITY (KG)

Jan 96 250

Feb 110 200

Mar 100 250

Aprl 90 280

Page 27: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 27

May 86 300

June 92 300

July 112 220

Aug 112 220

Sep 108 200

Oct 116 210

Nov 86 300

Dec 92 250

Find the regression equation based on the above data

Can you estimate the appropriate quantity likely to be purchased if the price shoot

upon Rs 124/kg?

Hence or otherwise obtain the coefficient of correlation between the price prevailing

and the quantity demanded

Answer: S. No

1 96 250 9216 62500 24000

2 110 200 12100 40000 22000

3 100 250 10000 62500 25000

4 90 280 8100 78400 25200

5 86 300 7396 90000 25800

6 92 300 8464 90000 27600

7 112 220 12544 48400 24640

8 112 220 12544 48400 24640

9 108 200 11664 40000 21600

10 116 210 13456 44100 24360

11 86 300 7396 90000 25800

12 92 250 8464 62500 23000

Total ∑ 1200 ∑ 2980 ∑ 121344 ∑ 756800 ∑ 293640

Page 28: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 28

a) Regression equation

√∑

( )

√∑

( )

( ) ( ) ( ) ( )

Correlation coefficient

( )

The line of regression of on is

( )

( )

( )

The line of regression of on is

( )

( )

( )

b) Estimation of the appropriate quantity likely to be purchased if the price shoot upon

Rs 124/kg,

Given price short , then

( )

c) Correlation coefficient

Page 29: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 29

Problem: 17

Find the standard error of the estimate from the data given below.

X 1 2 3 4 5

Y 1 2 1.3 3.75 2.25

Answer: S. No

1 1 1 1 1 1

2 2 2 4 4 4

3 3 1.30 9 1.69 3.90

4 4 3.75 16 14.06 15

5 5 2.25 25 5.06 11.25

Total ∑ 15 ∑ 10.3 ∑ 55 ∑ 25.82 ∑ 35.15

a) Regression equation

√∑

( )

√∑

( )

( ) ( ) ( ) ( )

Correlation coefficient

( )

The line of regression of on is

( )

( )

( )

Page 30: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 30

II. Standard Error Estimate

S. No ( )

1 1 1 1.21 0.04

2 2 2 1.63 0.13

3 3 1.30 2.06 0.58

4 4 3.75 2.49 1.60

5 5 2.25 2.91 0.44

∑( )

√∑( )

Problem: 18

Find the standard error of the estimate from the data given below.

X 1 2 3 4 5 6 7

Y 2 4 7 6 5 6 5

Answer: S. No

1 1 2 1 4 2

2 2 4 4 16 8

3 3 7 9 49 21

4 4 6 16 36 24

5 5 5 25 25 25

6 6 6 36 36 36

7 7 5 49 25 35

Total ∑ 28 ∑ 35 ∑ 140 ∑ 191 ∑ 151

Page 31: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 31

a) Regression equation

√∑

( )

√∑

( )

( ) ( ) ( ) ( )

Correlation coefficient

( )

The line of regression of on is

( )

( )

( )

III. Standard Error Estimate

S. No ( )

1 1 2 3.82 3.32

2 2 4 4.22 0.05

3 3 7 4.61 5.72

4 4 6 5.00 1.00

5 5 5 5.40 0.16

Page 32: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 32

6 6 6 5.79 0.04

7 7 5 6.18 1.39

∑( )

√∑( )

Problem: 19

The two lines of regression are . The

variance of is 9. Find (i) The mean values of and (ii) correlation coefficient between

and .

Answer:

Since both the lines of regression passes through the mean values and .

The point ( ) must satisfy the two lines.

( )

( )

Solving ( ) and ( ), we get

The mean values of and are

Consider the line is a regression line on .

Consider the line is a regression line on .

Page 33: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 33

Correlation co efficient √

Correlation co efficient

Problem: 20

The two lines of regression are . Find ( ) and

correlation coefficient between and .

Answer:

Since both the lines of regression passes through the mean values and .

The point ( ) must satisfy the two lines.

( )

( )

Solving ( ) and ( ), we get

Consider the line is a regression line on .

Consider the line is a regression line on .

Page 34: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 34

Correlation co efficient √

Correlation co efficient

Problem: 21

The regression equation of and is . If the mean value of

and the correlation coefficient.

Answer:

Given the regression equation of and is

.

Since the line of regression passes through ( ), then

Also given mean value of is

( )

Hence mean value of is 48.

which is the line of regression of on .

Page 35: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 35

[

]

( )

( )

.

/

Problem: 22

If and are uncorrelated random variables with variances and . Find the

correlation coefficient between and .

Answer:

Given that ( ) ( )

Let us take and

Given that both and are uncorrelated.

Now ( ) ( )

( ) ( ) , ( ) ( ) ( )-

( )

and ( ) ( )

( ) ( ) , ( ) ( ) ( )-

( )

Page 36: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 36

( ) ,( )( )-

( )

( ) ( )

( ) ,( )- ( ) ( )

( ) ,( )- ( ) ( )

Now ( ) ( ) ( ) ( )

( ) ( ) * ( ) ( )+ * ( ) ( )+

( ) ( ) ,* ( )+ * ( )+ -

, ( ) * ( )+ - , ( ) * ( )+ -

( ) ( )

( ) ( )

( )

Problem: 24

If the independent random variables and have the variances 3 and

respectively, find the correlation coefficient between and .

Answer:

Given that ( ) ( )

Let us take and

Given that both and are uncorrelated.

Now ( ) ( )

( ) ( ) , ( ) ( ) ( )-

( )

Page 37: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 37

and ( ) ( )

( ) ( ) , ( ) ( ) ( )-

( )

( ) ,( )( )-

( )

( ) ( )

( ) ,( )- ( ) ( )

( ) ,( )- ( ) ( )

Now ( ) ( ) ( ) ( )

( ) ( ) * ( ) ( )+ * ( ) ( )+

( ) ( ) ,* ( )+ * ( )+ -

, ( ) * ( )+ - , ( ) * ( )+ -

( ) ( )

( ) ( )

√ √

( )

Page 38: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 38

Curve Fitting

The principle of least squares

Fitting a Straight line

Let ( ) be a sets of observations and they related by the relation

. By calculating and by using the normal equations

∑ ∑

∑ ∑ ∑

and substitute in the equation , we get the best fitting straight line.

Problem: 25

By the method of least squares find the best fitting straight line to the data given below.

5 10 15 20 25

15 19 23 26 30

Answer:

Let the straight line be .

The normal equations are

∑ ∑ ( )

∑ ∑ ∑ ( )

S.No

1 5 15 25 75

2 10 19 100 190

3 15 23 225 345

4 20 26 400 520

5 25 30 625 750

Total ∑ 75 ∑ 113 ∑ 1375 ∑ 1880

Therefore equations ( ) and ( ) becomes

( )

( )

Solving, we get and

Therefore the best fit of straight line is

Page 39: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 39

Problem: 26

Fit a straight line to the data also find the value of at

0 1 2 3 4

1 1.8 3.3 4.5 6.3

Answer:

Let the straight line be .

The normal equations are

∑ ∑ ( )

∑ ∑ ∑ ( )

S.No

1 0 1 0 0

2 1 1.8 1 1.8

3 2 3.3 4 6.6

4 3 4.5 9 13.5

5 4 6.3 16 25.2

Total ∑ 10 ∑ 16.9 ∑ 30 ∑ 47.1

Therefore equations ( ) and ( ) becomes

( )

( )

Solving, we get and

Therefore the best fit of straight line is

To find at :

( ) ( )

.

Problem: 27

Fit a straight line to the following data. Also estimate the value at .

71 68 73 69 67 65 66 67

69 72 70 70 68 67 68 64

Answer:

Page 40: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 40

Let the straight line be .

The normal equations are

∑ ∑ ( )

∑ ∑ ∑ ( )

Therefore equations ( ) and ( ) becomes

( )

( )

Solving, we get and

Therefore the best fit of straight line is

To find at :

( ) ( )

Fitting a Parabola

Let ( ) be a sets of observations and they related by the relation

. By calculating and by using the normal equations

∑ ∑ ∑ ( )

∑ ∑ ∑ ∑ ( )

∑ ∑ ∑ ∑ ( )

and substitute in the equation , we get the best fitting parabola.

S.No

1 71 69 5041 4899

2 68 72 4624 4896

3 73 70 5329 5110

4 69 70 4761 4830

5 67 68 4489 4556

6 65 67 4225 4355

7 66 68 4356 4488

8 67 64 4489 4288

Total ∑ 546 ∑ 548 ∑ 37314 ∑ 37422

Page 41: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 41

Problem: 28

By the method of least squares find the best fitting straight line to the data given below.

15 2 3 4

1.7 1.8 2.3 3.2

Answer:

Let the parabola be ( )

The normal equations are

∑ ∑ ∑ ( )

∑ ∑ ∑ ∑ ( )

∑ ∑ ∑ ∑ ( )

1 1.7 1 1 1 1.7 1.7

2 1.8 4 8 16 3.6 7.2

3 2.3 9 27 81 6.9 20.7

4 3.2 16 64 256 12.8 51.2

∑ ∑ ∑ ∑ ∑ ∑ ∑

Therefore equations ( ) ( ) and ( ) becomes

( )

( )

( )

Solving, we get and

Therefore the best fit of straight line is

Problem: 29

Fit a curve from the data given below.

3 5 7 9 11 13

2 3 4 6 5 8

Answer:

Page 42: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 42

Let the parabola be ( )

The normal equations are

∑ ∑ ∑ ( )

∑ ∑ ∑ ∑ ( )

∑ ∑ ∑ ∑ ( )

3 2 9 27 81 6 18

5 3 25 125 625 15 75

7 4 49 343 2401 28 196

9 6 81 729 6561 54 486

11 5 121 1331 14641 55 605

13 8 169 2197 28561 104 1352

∑ ∑ ∑ ∑ ∑ ∑ ∑

Therefore equations ( ) ( ) and ( ) becomes

( )

( )

( )

Solving, we get and

Therefore the best fit of straight line is

Problem: 30

Fit a second degree curve from the data given below.

Answer:

Let the parabola be ( )

The normal equations are

∑ ∑ ∑ ( )

∑ ∑ ∑ ∑ ( )

∑ ∑ ∑ ∑ ( )

Page 43: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 43

∑ ∑ ∑ ∑ ∑ ∑ ∑

Therefore equations ( ) ( ) and ( ) becomes

( )

( )

( )

Solving, we get and

Therefore the best fit of straight line is

Curve of the form

Let ( ) be a sets of observations and they related by the relation

.

Consider the curve

Taking log on both sides, we get

( )

( )

Assume and .

( ) ( )

It is a straight line.

Page 44: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 44

Curve of the form

Let ( ) be a sets of observations and they related by the relation

.

Consider the curve

Taking log on both sides, we get

( )

( )

Assume and .

( ) ( )

It is a straight line.

Problem: 31

Fit a curve for the following data.

2 3 4 5 6

8.3 15.4 33.1 65.2 127.4

Answer:

Given the curve

Taking log on both sides, we get

( )

( )

Assume and .

( ) ( )

It is a straight line.

The normal equations are

∑ ∑ ( )

∑ ∑ ∑ ( )

Page 45: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 45

2 2 8.3 0.92 4 1.84

3 3 15.4 1.19 9 3.56

4 4 33.1 1.52 16 6.08

5 5 65.2 1.81 25 9.07

6 6 127.4 2.11 36 12.63

∑ 20 ∑ 7.55 ∑ 90 ∑ 33.18

Therefore equations ( ) and ( ) becomes

( )

( )

Solving, we get and

Since

and

The required curve is

( )

Problem: 32

Fit a curve for the following data.

0 1 2 3 4 5 6 7

10 21 35 59 92 200 400 610

Answer:

Given the curve

Taking log on both sides, we get

( )

( )

Assume and .

( ) ( )

It is a straight line.

The normal equations are

∑ ∑ ( )

∑ ∑ ∑ ( )

Page 46: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 46

0 10 1.00 0 0.00

1 21 1.32 1 1.32

2 35 1.54 4 3.09

3 59 1.77 9 5.31

4 92 1.96 16 7.86

5 200 2.30 25 11.51

6 400 2.60 36 15.61

7 610 2.79 49 19.50

∑ 28 ∑ 15.29 ∑ 140 ∑ 64.19

Therefore equations ( ) and ( ) becomes

( )

( )

Solving, we get and

Since

and

The required curve is

( )

Fitting a straight line trend Method of least square:

or

Problem: 33

Fit a straight line trend by the method of least squares to the following data. Also forecast

for the year 2015.

Year : 2005 2006 2007 2008 2009 2010 2011 2012

Earning in

lakhs 38 40 65 72 69 60 87 95

Answer:

Page 47: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 47

For making easy calculation let us subtract each values of by the average of years

( ), that is 2008.5 and the new data of are tabulated below.

Year (x)

Earning

in lakhs

New

2005 38 12.25

2006 40 6.25

2007 65 2.25

2008 72 0.25

2009 69 0.25

2010 60 2.25

2011 87 6.25

2012 95 12.25

∑ 526 ∑ 0 ∑ 42 ∑ 308

I. Fitting a trend line:

( )

( ) becomes

( )

II. Forecast for the year 2015

And new

( )

Page 48: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 48

Problem: 34

Find line of best fit for the following time series data.

Year : 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

Active in

ratio in

the XYZ

Co. %

2 5 5 10 12 16 17 14 20 23

Also forecast the year 2014 and 2015.

Answer:

For making easy calculation let us subtract each values of by the average of years

( ), that is 2004.5 and the new data of are tabulated below.

Year

( )

Earning

in lakhs

New

2000 2 20.25

2001 5 12.25

2002 5 6.25

2003 10 2.25

2004 12 0.25

2005 16 0.5 0.25 8

2006 17 1.5 2.25 25.5

2007 14 2.5 6.25 35

2008 20 3.5 12.25 70

2009 23 4.5 20.25 103.5

∑ 124 ∑ 0 ∑ 82.5 ∑ 182

I. Fitting a trend line:

( )

Page 49: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 49

( ) becomes

( )

II. Forecast for the year 2014

And new

( )

III. Forecast for the year 2015

And new

( )

Problem: 35

The following data on production (in ‘000 units) of a commodity from the year 2006-2012.

Fit a straight line trend and forecast for the year 2020.

Year : 2006 2007 2008 2009 2010 2011 2012

Production 6 7 5 4 6 7 5

Answer:

For making easy calculation let us subtract each values of by the average of years

( ), that is 2009 and the new data of are tabulated below.

Year

( )

Earning

in lakhs

New

2006 6

2007 7

2008 5

2009 4

2010 6

Page 50: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 50

2011 7

2012 5

∑ 40 ∑ 0 ∑ 28 ∑

I. Fitting a trend line:

( )

( ) becomes

( )

II. Forecast for the year 2020

And new

( )

Page 51: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 51

Two Marks

1. Define correlation. Give one example.

Answer:

If the change in one variable affects the change in the other variable, then the

variable are said to be correlated.

Ex:

Taller people have larger shoe sizes and shorter people have smaller shoe sizes.

2. Define positive correlation. Give one example.

Answer:

If the two variable deviate in the same direction (i.e., increase or decrease) in one

variable in a corresponding (increase or decrease) in other variable is said to be positive

correlation.

Ex: Income and expenditure

3. Define negative correlation. Give one example.

Answer:

If the two variable deviate in opposite direction (i.e., increase or decrease) in one

variable in a corresponding (decrease or increase) in other variable is said to be negative

correlation.

Ex: Price and demand of a product

4. Write down Spearman’s Rank Correlation formula.

Answer:

( )

where is the number of data points of the two variables and is the difference in the

ranks of the ith element of each random variable considered. The value lies between

to .

5. What are the Merits of Rank Correlation Coefficient.

Answer:

Page 52: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 52

Spearman’s rank correlation coefficient can be interpreted in the same way as the

Karl Pearson’s correlation coefficient;

It is easy to understand and easy to calculate;

6. What are the Demerits of Rank Correlation Coefficient

Answer:

Product moment correlation coefficient can be calculated for bivariate frequency

distribution but rank correlation coefficient cannot be calculated; and

If , this formula is time consuming.

7. Write down Karl – pearson’s co efficient of Correlation.

Answer:

( ) ( ) ( )

8. Define Coefficient of Determination

Answer:

The coefficient of determination or squared method is the proportion of the variance

in the dependent variable that is predicted from the independent variable. It indicates the

level of variation in the given data set.

9. Write any two properties of Coefficient of Determination

Answer:

The coefficient of determination is the square of the correlation(r), thus it ranges

from 0 to 1.

With linear regression, the correlation of determination is equal to the square of the

correlation between the x and y variables.

10. Define Regression

Answer:

Regression is mathematical measure of the average relationship between two or

more variables in terms of original limits of the data.

The equation of line of regression of on is

Page 53: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 53

( )

The equation of line of regression of on is

( )

11. What are the differences between correlation and regression?

Answer: Correlation is used to represent the linear relationship between two variables. On the

contrary, regression is used to fit the best line and estimate one variable on the basis of

another variable. ... Unlike regression whose goal is to predict values of the random

variable on the basis of the values of fixed variable.

12. What is Regression co efficient. Answer:

13. Write down the two regression lines.

Answer: The equation of line of regression of on is

( )

The equation of line of regression of on is

( )

14. The two regression equations of two random variables and are

and . Find the mean values of and .

Answer:

Since both the lines of regression passes through the mean values and .

The point ( ) must satisfy the two lines.

( )

( )

Solving ( ) and ( ), we get

Page 54: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 54

Problem: 1

The following table provides data about the percentage of students who have free

university meals and their CGPA scores. Calculate the Spearman’s Rank Correlation

between the two and interpret the result.

State University

% of students having free meals

% of students scoring above 8.5 CGPA

Pune 14.4 54

Chennai 7.2 64

Delhi 27.5 44

Kanpur 33.8 32

Ahmedabad 38.0 37

Indore 15.9 68

Guwahati 4.9 62

Problem: 2

Compute the coefficient of rank correlation between sales and advertisement expressed in

thousands of dollars from the following data:

Sales 90 85 68 75 82 80 95 70

Advertisement 7 6 2 3 4 5 8 1

Problem: 3

Find the rank correlation co efficient from the following data.

Rank in X 1 2 3 4 5 6 7

Rank in Y 4 3 1 2 6 5 7

Problem: 4

The ranks of some 16 students in mathematics and physics are as follows. Find the

rank correlation for the proficiency in mathematics and physics.

Rank in Math’s 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Rank in Physics 1 10 3 4 5 7 2 6 8 11 15 9 14 12 16 13

Problem: 5

Suppose we have ranks of 5 students in three subjects Computer, Physics and Statistics and

we want to test which two subjects have the same trend.

Rank in Computer 2 4 5 1 3

Rank in Physics 5 1 2 3 4

Rank in Statistics 2 3 5 4 1

Page 55: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 55

Problem: 6

Determine the rank correlation co efficient for the following data.

68 64 75 50 64 80 75 40 55 64

62 58 68 45 81 60 68 48 50 70

Problem: 7

The sample of 12 fathers and their eldest sons have the following data about their

heights in inches.

Fathers 65 63 67 64 68 62 70 66 68 67 69 71

Sons 68 66 68 65 69 66 68 65 71 67 68 70

Problem: 8

Find the correlation co efficient for the following data

X 10 14 18 22 26 30 Y 18 12 24 6 30 36

Problem: 9

The table below shows the number of absences, , in a Calculus course and the final exam

grade, , for 7 students. Find the correlation coefficient and interpret your result.

X 1 0 2 6 4 3 3 Y 95 90 90 55 70 80 85

Problem: 10

The marks obtained by 10 students in Mathematics and Statistics are given below.

Find the correlation coefficient between the two subjects.

Marks in math’s 75 30 60 80 53 35 15 40 38 48

Marks in Stats 85 45 54 91 58 63 35 43 45 44

Problem: 11

Compute the coefficient of correlation between X and Y using the following data:

X 1 3 5 7 8 10

Y 8 12 15 17 18 20

Problem: 12

Calculate the coefficient of correlation for the following data:

9 8 7 6 5 4 3 2 1

15 16 14 13 11 12 10 8 9

Page 56: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 56

Problem: 13

Obtain the equations of the lines of regression from the following data:

1 2 3 4 5 6 7

9 8 10 12 11 13 14

Problem: 14

From the following data find

The two regression lines

The co efficient of correlation between the marks in economics and statistics

The most likely marks in statistics when marks in economics are 30.

Marks in Economics: 25 28 35 32 31 36 29 38 34 32

Marks in Statistics: 43 46 49 41 36 32 31 30 33 39

Problem: 15

A tyre manufacturing company is interested in removing pollutants from the exhaust at the

factory, and cost is a concern. The company has collected data from other companies

concerning the amount of money spent on environmental measures and the resulting

amount of dangerous pollutants released (as a percentage of total emissions)

Money spent

(Rupees in lakhs

8.4 10.2 16.5 21.7 9.4 8.3 11.5 18.4 16.7 19.3 28.4 4.7 12.3

Percentage of

dangerous pollutants

35.9 31.8 24.7 25.2 36.8 35.8 33.4 25.4 31.4 27.4 15.8 31.5 28.9

a) Compute the regression equation.

b) Predict the percentage of dangerous pollutants released when Rs. 20,000 is spent on

control measures.

c) Find the standard error of the estimate (regression line).

Problem: 16

The quantity of a raw material purchased by a company at the specified prices during the

12 months of 1992 is given

MONTH PRICE/KG QUANTITY (KG)

Jan 96 250

Feb 110 200

Page 57: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 57

Mar 100 250

Aprl 90 280

May 86 300

June 92 300

July 112 220

Aug 112 220

Sep 108 200

Oct 116 210

Nov 86 300

Dec 92 250

Find the regression equation based on the above data

Can you estimate the appropriate quantity likely to be purchased if the price shoot

upon Rs 124/kg?

Hence or otherwise obtain the coefficient of correlation between the price prevailing

and the quantity demanded

Problem: 17

Find the standard error of the estimate from the data given below.

X 1 2 3 4 5

Y 1 2 1.3 3.75 2.25

Problem: 18

Find the standard error of the estimate from the data given below.

X 1 2 3 4 5 6 7

Y 2 4 7 6 5 6 5

Problem: 19

The two lines of regression are . The

variance of is 9. Find (i) The mean values of and (ii) correlation coefficient between

and .

Problem: 20

The two lines of regression are . Find ( ) and

correlation coefficient between and .

Problem: 21

The regression equation of and is . If the mean value of

and the correlation coefficient.

Page 58: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 58

Problem: 22

If and are uncorrelated random variables with variances and . Find the

correlation coefficient between and .

Problem: 24

If the independent random variables and have the variances 3 and

respectively, find the correlation coefficient between and .

Problem: 25

By the method of least squares find the best fitting straight line to the data given below.

5 10 15 20 25

15 19 23 26 30

Problem: 26

Fit a straight line to the data also find the value of at

0 1 2 3 4

1 1.8 3.3 4.5 6.3

Problem: 27

Fit a straight line to the following data. Also estimate the value at .

71 68 73 69 67 65 66 67

69 72 70 70 68 67 68 64

Problem: 28

By the method of least squares find the best fitting straight line to the data given below.

15 2 3 4

1.7 1.8 2.3 3.2

Problem: 29

Fit a curve from the data given below.

3 5 7 9 11 13

2 3 4 6 5 8

Problem: 30

Fit a second degree curve from the data given below.

Page 59: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 59

Problem: 31

Fit a curve for the following data.

2 3 4 5 6

8.3 15.4 33.1 65.2 127.4

Problem: 32

Fit a curve for the following data.

0 1 2 3 4 5 6 7

10 21 35 59 92 200 400 610

Problem: 33

Fit a straight line trend by the method of least squares to the following data. Also forecast

for the year 2015.

Year : 2005 2006 2007 2008 2009 2010 2011 2012

Earning in

lakhs 38 40 65 72 69 60 87 95

Problem: 34

Find line of best fit for the following time series data.

Year : 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

Active in

ratio in

the XYZ

Co. %

2 5 5 10 12 16 17 14 20 23

Also forecast the year 2014 and 2015.

Problem: 35

The following data on production (in ‘000 units) of a commodity from the year 2006-2012.

Fit a straight line trend and forecast for the year 2020.

Year : 2006 2007 2008 2009 2010 2011 2012

Production 6 7 5 4 6 7 5

Page 60: SRIT / PICM105 SFM / Correlation and Regression SRI … · 2019. 12. 5. · SRIT / PICM105 – SFM / Correlation and Regression SRIT / M & H / M. Vijaya Kumar 3 Spearman’s Rank

SRIT / PICM105 – SFM / Correlation and Regression

SRIT / M & H / M. Vijaya Kumar 60

“I am a slow walker, but

I never walk back.”

― Abraham Lincoln