ba assignment 3

24
BUSINESS ANALYTICS ASSIGNMENT 3 XAVIER INSTITUTE OF SOCIAL SERVICE, PURULIA ROAD, RANCHI Submitted by: RADHIKA PADIA (ROLL NO 15)

Upload: radhikapadia

Post on 16-Jul-2016

226 views

Category:

Documents


0 download

DESCRIPTION

business analytics

TRANSCRIPT

BUSINESS ANALYTICS

ASSIGNMENT 3

XAVIER INSTITUTE OF SOCIAL SERVICE,

PURULIA ROAD, RANCHI

Submitted by:

RADHIKA PADIA (ROLL NO 15)

THE DATA TABLE:

CONCEPT CIS

AGE

MARTIAL

NUMCAR

AVAGE

NUMTRIP

CONCEPT1

NUMCAR1 GROUPS

3 0 24 0 4 2.5 1 0 1 11 0 22 0 1 3 1 0 0 12 0 27 0 1 2.5 0 0 0 12 0 26 0 2 2 2 0 1 12 0 29 0 1 2.5 1 0 0 11 0 23 0 1 3 0 0 0 12 0 24 0 2 3 0 0 1 11 0 22 0 2 3 1 0 1 12 0 26 0 1 2 1 0 0 15 0 48 0 1 0.5 3 1 0 14 0 37 0 1 1.5 2 1 0 12 0 32 1 3 3 0 0 1 22 0 32 1 1 1 0 0 0 21 0 37 1 1 1 0 0 0 22 0 41 1 1 2 3 0 0 23 0 23 1 1 2 0 0 0 21 0 34 1 5 3 1 0 1 22 0 38 1 2 2 0 0 1 23 0 32 1 1 1.5 0 0 0 22 0 29 1 3 2.5 0 0 1 23 0 36 1 1 1.5 0 0 0 23 0 34 1 1 1.5 0 0 0 23 0 32 1 2 2 1 0 1 21 0 28 1 1 2.5 0 0 0 24 0 29 1 1 2 3 1 0 24 0 47 1 2 1 5 1 1 25 0 24 1 1 0.5 9 1 0 24 0 43 1 2 1 2 1 1 24 0 42 1 3 1.5 2 1 1 25 0 44 1 1 0.5 7 1 0 22 1 24 0 3 1 2 0 1 32 1 24 0 4 2 0 0 1 31 1 23 0 2 2.5 0 0 1 33 1 36 0 1 2 0 0 0 33 1 39 0 1 2 0 0 0 31 1 30 0 1 2 0 0 0 31 1 28 0 2 4.5 2 0 1 34 1 42 0 1 2 2 1 0 33 1 39 1 1 1.5 1 0 0 41 1 27 1 1 2 3 0 0 43 1 36 1 2 2.5 1 0 1 43 1 47 1 1 2 1 0 0 4

3 1 42 1 2 3.5 1 0 1 44 1 42 1 1 0.5 3 1 0 45 1 47 1 1 1 4 1 0 44 1 43 1 2 1.5 4 1 1 45 1 62 1 1 0.5 6 1 0 45 1 55 1 2 0.5 4 1 1 44 1 39 1 3 1.5 5 1 1 45 1 58 1 1 2 6 1 0 44 1 43 1 1 0.4 2 1 0 45 1 59 1 1 0.5 7 1 0 44 1 43 1 3 1.5 3 1 1 44 1 42 1 1 1.5 4 1 0 44 1 47 1 4 1.5 4 1 1 44 1 38 1 2 1 3 1 1 45 1 37 1 2 0.8 8 1 1 44 1 51 1 1 1 2 1 0 44 1 47 1 2 1.5 1 1 1 45 1 51 1 1 2 6 1 0 4

QUESTON 1:

Divide the sample into two groups a. Those showing high interest – “4” or “5” rating on CONCEPT b. Those showing low interest – “1”or “2”or “3” rating on CONCEPT Cross tabulate high versus low interest with CIS. How strong is the association between interest in the policy and the current insurance supplier? Is the association statistically significant? What does it tell you?

SOLUTION:

THE RECODED VALUES OF CONCEPT IS NAMED AS CONCEPT1 LABELLED AS FOLLOWS:

0: LOW INTEREST GROUP 1:HIGH INTEREST GROUP

Cross tabulation: Two-Way

Counts

CONCEPT1(rows) by CIS(columns)

0 1 Total

0 22 12 34

1 8 18 26

Total 30 30 60

Chi-Square Tests of Association for CONCEPT1 and CIS

Test Statistic Value Df p-Value

Pearson Chi-Square 6.787 1.000 0.009

Number of Valid Cases: 60

INTERPRETATION:

Let us assume,

Ho: the association between interest in the policy and the current insurance supplier is not strong.

H1: the association between interest in the policy and the current insurance supplier is strong.

From the above analysis, we get to know that p value is 0.009 which is less than 0.05 (at 5% level of significance).

Therefore Ho is rejected and H1 is accepted

We can conclude that the association between interest in the policy and the current insurance supplier is very strong and significant.

QUESTION2:

We can consider the concept rating (CONCEPT) as an Independent Variable and the remaining 6 variables as predictor variables. Regress CONCEPT on the other variables:

a. Interpret the regression equation and indicate the extent to which those variations in the predictor variables explain the variation in the independent variable? b. Is each of the predictor variables significant at 0.05 level? Can a simpler mode (involving fewer predictors be developed? If so what is the model and what is the percentage improvement of the simple model over the full model?

SOLUTION:PART a:

▼OLS Regression

Dependent Variable CONCEPT

N 60

Multiple R 0.869

Squared Multiple R 0.755

Adjusted Squared Multiple R 0.727

Standard Error of Estimate 0.705

Regression Coefficients B = (X'X)-1X'Y

Effect Coefficient Standard Error Std.Coefficient

Tolerance t p-Value

CONSTANT 1.360 0.588 0.000 . 2.315 0.025

CIS -0.067 0.211 -0.025 0.746 -0.318 0.752

MARTIAL -0.098 0.239 -0.034 0.673 -0.409 0.684

AGE 0.056 0.013 0.418 0.451 4.132 0.000

NUMCAR 0.033 0.099 0.023 0.914 0.328 0.744

AVAGE -0.442 0.140 -0.282 0.579 -3.160 0.003

NUMTRIP 0.226 0.051 0.383 0.627 4.458 0.000

Analysis of Variance

Source SS df Mean Squares F-Ratio p-Value

Regression 81.359 6 13.560 27.248 0.000

Analysis of Variance

Source SS df Mean Squares F-Ratio p-Value

Residual 26.375 53 0.498    

INTERPRETATION:

The regression equation is:

CONCEPT=1.360−0.067CIS−0.098 AGE+0.056MARTIAL+0.033 AVAGE−0.442 NUMCAR+0.226 NUMTRIP

Let us assume,

Ho: the regression equation is not significant in predicting the dependent variable.

H1: the regression equation is significant in predicting the dependent variable.

From the above analysis, we can see that -

The p value is 0.00 which is less than 0.05 at 5% level of significance ,therefore Ho is rejected and H1 is accepted i.e. , the regression equation is significant in predicting the dependent variable.

Also, squared multiple r is 0.755, which indicates that the goodness of fit is at a fairly good level.

THE p value of the constant is 0.025 which suggests that changes in the predictor variables are associated with changes in the dependent variable.

PART b:

1 ▼OLS Regression

Dependent Variable CONCEPT

N 60

Multiple R 0.323

Squared Multiple R 0.105

Adjusted Squared Multiple R 0.089

Standard Error of Estimate 1.290

Regression Coefficients B = (X'X)-1X'Y

Effect Coefficient Standard Error Std.Coefficient

Tolerance t p-Value

CONSTANT 2.633 0.235 0.000 . 11.184 0.000

CIS 0.867 0.333 0.323 1.000 2.603 0.012

Analysis of Variance

Source SS df Mean Squares F-Ratio p-Value

Regression 11.267 1 11.267 6.774 0.012

Residual 96.467 58 1.663

INTERPRETATION:

The regression equation is :

CONCEPT=2.633+0.867CIS

H0: the regression equation is not a significant predictor of the dependent variable (concept1)

H1: the regression equation is a significant predictor of the dependent variable (Concept1)

The p value is 0.012, which is less than 0.05 at 5% level of significance, which indicates that H0 is rejected and H1 is accepted.

Therefore CIS is a significant predictor of concept at 0.05 level of significance.

2. OLS Regression

Regression Coefficients B = (X'X)-1X'YEffect Coefficient Standard Error Std.

CoefficientTolerance t p-Value

CONSTANT -0.549 0.449 0.000 . -1.222 0.227

AGE 0.098 0.012 0.739 1.000 8.348 0.000

Analysis of Variance

Source SS df Mean Squares F-Ratio p-Value

Regression 58.796 1 58.796 69.685 0.000

Residual 48.937 58 0.844

INTERPRETATION:

The regression equation is: concept=−0.549+0.098 AGE

H0: the regression equation is not a significant predictor of the dependent variable( concept1)

H1: the regression equation is a significant predictor of the dependent variable( Concept1)

The p value is 0.000 , which is less than 0.05 at 5% level of significance ,which indicates that H0 is rejected and H1 is accepted.

Therefore AGE is a significant predictor of concept at 0.05 level of signicance.

3. OLS Regression

Regression Coefficients B = (X'X)-1X'YEffect Coefficient Standard Error Std.

CoefficientTolerance t p-Value

CONSTANT 2.211 0.282 0.000 . 7.851 0.000

MARTIAL 1.253 0.341 0.435 1.000 3.679 0.001

Analysis of Variance

Source SS df Mean Squares F-Ratio p-Value

Regression 20.380 1 20.380 13.532 0.001

Residual 87.353 58 1.506

INTERPRETATION:

The regression equation is: concept=2.211+1.253MARTIAL

H0: the regression equation is not a significant predictor of the dependent variable (concept1)

H1: the regression equation is a significant predictor of the dependent variable ( Concept1)

The p value is 0.001, which is less than 0.05 at 5% level of significance , which indicates that H0 is rejected and H1 is accepted.

Therefore MARTIAL is a significant predictor of concept at 0.05 level of signicance.

4. OLS Regression

Regression Coefficients B = (X'X)-1X'Y

Effect Coefficient Standard Error Std.Coefficient

Tolerance t p-Value

CONSTANT 3.395 0.352 0.000 . 9.633 0.000

NUMCAR -0.195 0.182 -0.139 1.000 -1.073 0.288

Analysis of Variance

Source SS df Mean Squares F-Ratio p-Value

Regression 2.095 1 2.095 1.150 0.288

Residual 105.638 58 1.821

INTERPRETATION:

The regression equation is: concept=3.395−0.195 NUMCAR

H0: the regression equation is not a significant predictor of the dependent variable( concept1)

H1: the regression equation is a significant predictor of the dependent variable( Concept1)

The p value is 0.288 , which is MORE than 0.05 at 5% level of significance ,which indicates that H0 is accepted and H1 is rejected.

Therefore NUMCAR is NOT a significant predictor of concept at 0.05 level of signicance.

5. OLS Regression

Regression Coefficients B = (X'X)-1X'YEffect Coefficient Standard Error Std.

CoefficientTolerance t p-Value

CONSTANT 4.968 0.295 0.000 . 16.856 0.000

AVAGE -1.074 0.150 -0.685 1.000 -7.164 0.000

Analysis of Variance

Source SS df Mean Squares F-Ratio p-Value

Regression 50.576 1 50.576 51.321 0.000

Residual 57.158 58 0.985

INTERPRETATION:

The regression equation is: concept=4.968−1.074 AVAGE

H0: the regression equation is not a significant predictor of the dependent variable( concept1)

H1: the regression equation is a significant predictor of the dependent variable( Concept1)

The p value is 0.000 , which is less than 0.05 at 5% level of significance ,which indicates that H0 is rejected and H1 is accepted.

Therefore AVAGE is a significant predictor of concept at 0.05 level of signicance.

6. OLS Regression

Regression Coefficients B = (X'X)-1X'Y

Effect Coefficient Standard Error Std.Coefficient

Tolerance t p-Value

CONSTANT 2.136 0.166 0.000 . 12.843 0.000

NUMTRIP 0.430 0.053 0.729 1.000 8.116 0.000

Analysis of VarianceSource SS Df Mean Squares F-Ratio p-Value

Regression 57.286 1 57.286 65.863 0.000

Residual 50.447 58 0.870

INTERPRETATION:

The regression equation is: concept=2.136+0 .430NUMTRIP

H0: the regression equation is not a significant predictor of the dependent variable( concept1)

H1: the regression equation is a significant predictor of the dependent variable( Concept1)

The p value is 0.000 , which is less than 0.05 at 5% level of significance ,which indicates that H0 is rejected and H1 is accepted.

Therefore NUMTRIP is a significant predictor of concept at 0.05 level of signicance.

A SIMPLER MODEL (involving fewer predictors) can be developed. This is as follows:

Best Subset Regression:DEPENDENT VARIABLE: CONCEPT

Model No R-Sq Adjusted R-Sq Mallows' Cp MSE Variables

1 0.546 0.538 42.339 0.844 AGE

2 0.532 0.524 45.374 0.870 NUMTRIP

3 0.707 0.696 9.524 0.555 AGE, NUMTRIP

4 0.662 0.650 19.222 0.639 AVAGE, AGE

5 0.754 0.741 1.295 0.474 AVAGE, AGE, NUMTRIP

6 0.709 0.693 11.100 0.561 CIS, AGE, NUMTRIP

7 0.754 0.736 3.190 0.481 AVAGE, MARTIAL, AGE, NUMTRIP

8 0.754 0.736 3.230 0.482 AVAGE, NUMCAR, AGE, NUMTRIP

9 0.755 0.732 5.101 0.489 AVAGE, MARTIAL, NUMCAR, AGE, NUMTRIP

10 0.755 0.732 5.108 0.489 AVAGE, MARTIAL, CIS, AGE, NUMTRIP

11 0.755 0.727 7.000 0.498 NUMTRIP, AVAGE, NUMCAR, MARTIAL, AGE, CIS

Model No AIC AICC BIC Variables

1 164.044 164.473 170.327 AGE

2 165.868 166.296 172.151 NUMTRIP

3 139.824 140.551 148.201 AGE, NUMTRIP

4 148.348 149.076 156.726 AVAGE, AGE

5 131.290 132.401 141.762 AVAGE, AGE, NUMTRIP

6 141.423 142.534 151.894 CIS, AGE, NUMTRIP

7 133.172 134.756 145.738 AVAGE, MARTIAL, AGE, NUMTRIP

8 133.217 134.802 145.783 AVAGE, NUMCAR, AGE, NUMTRIP

9 135.071 137.225 149.731 AVAGE, MARTIAL, NUMCAR, AGE, NUMTRIP

10 135.078 137.232 149.739 AVAGE, MARTIAL, CIS, AGE, NUMTRIP

11 136.956 139.780 153.711 NUMTRIP, AVAGE, NUMCAR, MARTIAL, AGE, CIS

The Best Models:

Criteria Value Best Subset Model

Adjusted R-Sq 0.741 AVAGE, AGE, NUMTRIP

The Best Models:

Criteria Value Best Subset Model

AIC 131.290 AVAGE, AGE, NUMTRIP

AIC (Corrected) 132.401 AVAGE, AGE, NUMTRIP

Schwarz's BIC 141.762 AVAGE, AGE, NUMTRIP

INTERPRETATION:

From the tables above, we see that the adjusted R sq of AVAGE, AGE, NUMTRIP is highest, i.e. 0.741 which indicates that it is the simple and best subset model.The AIC value of AVAGE, AGE, NUMTRIP is 131.290, which is the lowest, hence the best subset.

So, the percentage improvement of the simpler model over the full model would be as follows: [(0.741-0.727)/ 0.727]*100= 1.9257%

QUESTION 3:

Divide the sample into 4 groups: Rushmore – Single, Rushmore – Married, Other Company – Single, and Other Company – Married. Run a single factor ANOVA to test the null hypothesis that the mean of the CONCEPT for the four groups are the same at 5% level of significance? If not which group has the highest average rating?

SOLUTION:

Analysis of VarianceEffects coding used for categorical variables in model.

The categorical values encountered during processing are

Variables Levels

GROUPS (4 levels) 1.000 2.000 3.000 4.000

Dependent Variable CONCEPT

N 60

Multiple R 0.563

Squared Multiple R 0.317

Analysis of Variance

Source Type III SS df Mean Squares F-Ratio p-Value

GROUPS 34.150 3 11.383 8.663 0.000

Error 73.583 56 1.314

INTERPRETATION:

LET US ASSUME:Ho: the mean of the CONCEPT for the four groups are the same.H1: the mean of the CONCEPT for the four groups are different. From the analysis of variance, we get to know that p value is 0.000 at 5% level of significance ,which is less than 0.05.therefore null hypothesis is rejected and alternate hypothesis is accepted. We interpret that at least one of the means is different.

Now to know which mean is different, we need to perform the pairwise comparison, PAIRWISE COMPARISON:Post Hoc Test of CONCEPT

Using least squares means.

Using model MSE of 1.314 with 56 df.

Tukey's Honestly-Significant-Difference Test

GROUPS(i) GROUPS (j) Difference p-Value 95% Confidence Interval

        Lower Upper

1 2 -0.569 0.560 -1.719 0.581

1 3 0.148 0.992 -1.263 1.558

1 4 -1.727 0.001 -2.848 -0.606

2 3 0.717 0.454 -0.562 1.996

2 4 -1.158 0.011 -2.109 -0.207

3 4 -1.875 0.001 -3.128 -0.622

INTERPRETATION:

We compare the p value of all the 6 groups and select those p values whose value is less than 0.05.

From the above table, p values of group (1,4) , (2,4),( 3,4) are less than 0.05.

Amongst the group , we get to know that variable 4 (rushmore married) is common in all the groups having p value less than 0.05.

Therefore , it means that the mean of RUSHMORE MARRIED(4) is different .

QUESTION 4:

Divide the NUMCAR as follows – One and More than One. Now using the CONCEPT and the 4 groups (as developed in 3 above) run a 2 way ANOVA with the second concept as NUMCAR groups. Is there any difference between the results obtained in 3 above and this new 2 way ANOVA at 5% level of significance?

SOLUTION:

Analysis of Variance

Effects coding used for categorical variables in model.

The categorical values encountered during processing are

Variables Levels

GROUPS (4 levels) 1.000 2.000 3.000 4.000

NUMCAR1 (2 levels) 0.000 1.000

Dependent Variable CONCEPT

N 60

Multiple R 0.593

Squared Multiple R 0.351

Analysis of Variance

Source Type III SS df Mean Squares F-Ratio p-Value

GROUPS 34.538 3 11.513 8.568 0.000

NUMCAR1 2.614 1 2.614 1.945 0.169

GROUPS *NUMCAR1 2.430 3 0.810 0.603 0.616

Error 69.873 52 1.344    

INTERPRETATION:

We run a two way annova with the 4 GROUPS and NUMCAR.Let us assume, Hog: the average means of CONCEPT is similar as the average means of GROUPS.H1 g: the average means of CONCEPT is different from the average means of GROUPS.

Ho n: the average means of CONCEPT is similar as the average means of NUMCARH1 n: the average means of CONCEPT is different from the average means of NUMCAR. From the p values, we can see the null hypothesis (ho g) is rejected due to its value being less than 0.05, which means that the average means of CONCEPT is different from the average means of GROUPS.

Similarly, the p value of NUMCAR is greater than 0.05, therefore we accept the null hypothesis( ho n), i.e. , the average means of CONCEPT is similar as the average means of NUMCAR.

Now to know which variable of the GROUP has a different mean, we perform a pairwise comparison of the GROUPS.

PAIRWISE COMPARISON

Post Hoc Test of CONCEPTUsing least squares means.Using model MSE of 1.344 with 52 df.

Tukey's Honestly-Significant-Difference Test

Groups I) Groups (J) Difference p-Value 95% Confidence Interval

Lower Upper

1 2 -0.615 0.529 -1.781 0.550

1 3 0.089 0.998 -1.340 1.519

1 4 -1.786 0.001 -2.922 -0.650

2 3 0.705 0.483 -0.592 2.001

2 4 -1.170 0.012 -2.134 -0.207

3 4 -1.875 0.001 -3.145 -0.605

INTERPRETATION:

We compare the p value of all the 6 groups and select those p values whose value is less than 0.05.

From the above table, p values of group (1,4) , (2,4),( 3,4) are less than 0.05.

Amongst the group , we get to know that variable 4 (Rushmore Married) is common in all the groups having p value less than 0.05.

Therefore , it means that the mean of RUSHMORE MARRIED(4) is different .

Therefore, there is no difference in the results obtained in question 3 above and this new 2 way ANOVA at 5% level of significance.

QUESTION 5:

Factor analyse the full 60x7 data matrix using principal component analysis using Varimax rotation. Apply Kaiser’s criterion (eigenvalue > 1) to extract the principal components. How will you interpret each set of rotated factor loadings?

▼Factor Analysis

Latent Roots (Eigenvalues)

1 2 3 4 5 6 7

3.413 1.065 0.913 0.663 0.432 0.353 0.161

Component Loadings

1 2

CONCEPT 0.906 -0.014

CIS 0.460 0.496

AGE 0.855 0.100

MARTIAL 0.619 0.075

NUMCAR -0.207 0.853

AVAGE -0.779 0.269

NUMTRIP 0.786 0.048

Variance Explained by Components

1 2

3.413 1.065

Percent of Total Variance Explained

1 2

48.754 15.217

Rotated Loading Matrix (VARIMAX, Gamma = 1.000000)

1 2

CONCEPT 0.904 -0.051

CIS 0.481 0.477

AGE 0.858 0.065

MARTIAL 0.621 0.049

NUMCAR -0.171 0.861

AVAGE -0.767 0.301

NUMTRIP 0.788 0.015

"Variance" Explained by Rotated Components

1 2

2.791 1.063

Percent of Total Variance Explained

1 2

39.871 15.182

INTERPRETATION:

There are two latent factors at work which explains 63.97% of the total market behaviour of the rushmore insurance.

P1=0.906CONCEPT+0.460CIS+0.855 AGE+0.619MARTIAL−0.207 NUMCAR−0.779 AVAGE+0.786 NUMTRIP

P2=−0.014CONCEPT +0.496CIS+0.100 AGE+0.075MARTIAL+0.853 NUMCAR+0.269 AVAGE+0.048NUMTRIP

FACTORS PRINCIPAL COMPONENTS

CONCEPT 1

CIS 1AGE 1

MARTIAL 1NUMCAR 1AVAGE 2

NUMTRIP 1

Nomenclature of PC1 consisting of factors- CONCEPT, CIS, AGE, MARTIAL, AVAGE, NUMTRIP is

“SELF CHARACTERISTICS”

Nomenclature of PC2 consisting of factors- NUMCAR is

“USAGE CHARACTERISTICS”

Therefore we can say that 48.754% of market would buy insurance depending on self characteristics and 15.21% of market would buy insurance depending on the car characteristics.

RECOMMENDATIONS:

1.WISDOM OF OFFERING :

From the CHI SQUARE test of association, we understood that the association between interest in the policy and the current insurance supplier is very strong and significant, which means that the policy would be acceptable by the majority of respondents.

2. TARGET MARKET:

The factors influencing the respondent to buy the insurance are:

The age of the respondents, The marital status, Average age of cars owned The number of trips taken by the car owned

Tukey's Honestly-Significant-Difference TestAGE1(i) AGE1(j) Difference p-Value 95% Confidence Interval

Lower Upper1 2 -0.932 0.022 -1.762 -0.1031 3 -1.991 0.000 -2.821 -1.1621 4 -2.883 0.000 -4.054 -1.7132 3 -1.059 0.010 -1.921 -0.1962 4 -1.951 0.000 -3.145 -0.7573 4 -0.892 0.208 -2.086 0.302

The coding of age were as follows:

20-30: 1 30-40: 2 40-50: 3 50 and above : 4

After the pairwise comparison, we can see that AGE GROUP 20-40 is the segment highly interested in the insuance policy.

Also, from the above analysis, (question 4) , we concluded that the RUSHMORE MARRIED group is the most significant group and therefore this segment can serve as the target market of the insurance company.

From the factor analysis, we saw that 84.96% of market would buy insurance depending on self characteristics and 12.948% of market would buy insurance depending on the car characteristics.

3.FURTHER RESEARCH REQUIRED:

Cluster analysis could be done to segment the market further more.