business insights evaluation of a telecom client dataset using r

14
Telecom Dataset Analysis AbdulMajedRaja

Upload: abdulmajedraja-r-s

Post on 09-Jan-2017

246 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Business insights Evaluation of a Telecom client dataset using R

Telecom Dataset Analysis

AbdulMajedRaja

Page 2: Business insights Evaluation of a Telecom client dataset using R

1

Executive Summary:

This report summarises all the statistical findings for a telecom client. The primary

objective of the statistical finding is to find significant insights that can help the

company make right business decisions to grow and also to solve some problems.

The report states a few problems like understanding the factor that drives

increase in customer service calls and it also explores the option of upselling for

the company and most importantly it tries to find the factor that can possibly

drive customer churn.

Upon Statistical analysis, the report concludes that the company has to improve

their customer base in California which is also the state where the churn rate is

comparatively very high and necessary actions should be taken to control it. The

report also finds that most of the customers prefer both International plan and

Voice Mail plan which offers a bundling and upselling option to the company. The

report also finds out that the number of customer service calls is driven by the

total number of other calls the customer makes hence a spike in total calls means

the company has to improve the customer service executives.

Mostly importantly the report finds out that none of these given factors like

Customer service calls, Voice Mail plan or International plan choice contributes to

churn rather there must be some other factor which is not given to us and the

company has to investigate.

Page 3: Business insights Evaluation of a Telecom client dataset using R

2

Table of Contents 1. Introduction: ......................................................................................................................................... 4

1.1 Business Problem Formulation: .......................................................................................................... 4

2. Data Analysis: ........................................................................................................................................ 5

2.1 Methodology: ...................................................................................................................................... 5

2.2 Data Cleaning: ..................................................................................................................................... 5

2.3 State-wise Customer Distribution: ...................................................................................................... 6

2.4 State-wise Churn Rate:........................................................................................................................ 7

2.5 Drivers influencing Customer Service Calls: ........................................................................................ 8

2.6 Optional Offerings Associations: ......................................................................................................... 9

2.7 Factors driving Churn: ....................................................................................................................... 11

3. Conclusion: .......................................................................................................................................... 13

Page 4: Business insights Evaluation of a Telecom client dataset using R

3

List of Tables:

Table 1 - Variables Missing Value Count ....................................................................................................... 5

Table 2 - Dataset Summary ........................................................................................................................... 6

Table 3 - States with most & least customers .............................................................................................. 7

Table 4 - Crosstab between VMail Plan & International Plan ..................................................................... 10

List of Figures:

Figure 1- State-wise Customer Distribution .................................................................................................. 6

Figure 2 - State-wise Churn Rate .................................................................................................................. 7

Figure 3 - Customer Service Calls vs Total Day Calls ..................................................................................... 8

Figure 4 - Customer Service Calls vs Total Night Calls .................................................................................. 9

Figure 5 - Association between International Plan and Voice Mail Plan .................................................... 10

Page 5: Business insights Evaluation of a Telecom client dataset using R

4

1. Introduction:

A Telecom Operator has provided us their customer data to analyse and find

meaningful insights in business context that can help the company to improve

their process and services to their customers. This report summarises all of the

statistical findings from the analysis of the Telecom operator’s dataset.

The dataset consists of the information about 3000 customers of the company.

Each observation of the dataset contains 18 variables like State, Total Calls made

during day, evening and night and also whether the customer has stayed with the

company or churned out.

1.1 Business Problem Formulation:

What are the states where the company has its most customers and least

customers?

What are the states where the company has seen most churn and least

churn?

Is there any factor that drives the number of customer service calls?

Is there any upselling option?

Is there any factor that drives customers churning out of the company?

Page 6: Business insights Evaluation of a Telecom client dataset using R

5

2. Data Analysis:

2.1 Methodology:

Before we start doing the basic data analysis, it is important to check the health of

the given dataset. The given dataset, a csv file, even though looks clean, has a few

missing (NA) values. Hence the most important step is to clean the data and

prepare it for data analysis. Then basic summary statistics can be performed on it.

Then our Hypothesis can be tested and conclusions can be made. Statistical tool,

R, is used to perform all the data analysis given in this report.

2.2 Data Cleaning:

Once the given dataset is read into R environment as such, a basic summary of

the dataset reveals the missing values in the dataset.

Variable NA count

State 0

VMail.Plan 0

International.Plan 0

Total.Day.Minutes 271

Total.Day.Calls 271

Total.Day.Charge 271

Total.Evening.Minutes 264

Total.Evening.Calls 264

Total.Evening.Charge 264

Total.Night.Minutes 264

Total.Night.Calls 264

Total.Night.Charge 264

Total.International.Minutes 263

Total.International.Calls 263

Total.International.Charge 263

Customer.Service.Calls 298

Number.VMail.Messages 261

Churn 0 Table 1 - Variables Missing Value Count

Page 7: Business insights Evaluation of a Telecom client dataset using R

6

Variables State, VMail.Plan, International.Plan, Churn do not contain any missing

values and all other variables contain missing values about approximately 300

which is around 10% of the dataset. Hence instead of imputing alternative values

for these missing values, these can be omitted for the rest of the data analysis. A

new dataset can be created without any of these missing values.

Name Observations Variables

Given Dataset (teleco) 3000 18

New Dataset(clean) 2668 18 Table 2 - Dataset Summary

Also it has to be noted that any data analysis that only involves those variables

without any missing values can be performed on the original dataset instead of

the newly created dataset. Our new dataset is ready for further exploratory data

analysis.

2.3 State-wise Customer Distribution:

The most important insight for any business is to know its customer distribution

across different region where the company operates. Drawing a bar plot for all

the observations with respect to each state, creates this State-wise Customer

Distribution chart.

Figure 1- State-wise Customer Distribution

Page 8: Business insights Evaluation of a Telecom client dataset using R

7

From Fig.1, We can infer the following:

Customer Distribution

Top 3 States Bottom 3 States

West Virginia California

Minnesota Georgia

Wyoming Iowa Table 3 - States with most & least customers

The company’s Sales and Marketing team should focus more in the second

column of Table 3 to increase their customer base in these regions.

2.4 State-wise Churn Rate:

Customer churning is a real problem for any company. It’s very important for the

company to track the regions where the most and least churn happens.

Figure 2 - State-wise Churn Rate

Fig 2 shows that the state California leads the churn rate with more than 20% of

its customers churning out, followed by Missouri. It’s also clear that Kansas is the

state where there is almost no churn followed by Connecticut whose retention

rate is just next to Kansas.

Page 9: Business insights Evaluation of a Telecom client dataset using R

8

2.5 Drivers influencing Customer Service Calls:

Irrespective of the type of the company, Customers always expect good service

and for a telecom company it’s very important to have a stupendous customer

service team. To continuously expand the customer service team to meet the

customer’s requirements, the company has to understand the way customers

make calls to their customer service. Hence we are building a scatter plot to

understand the relationship between the number of day calls and customer

service calls.

Figure 3 - Customer Service Calls vs Total Day Calls

Fig 3 suggests that the number of customer service calls a customer makes has a

positive relationship with the total number of day calls the same customer makes.

Page 10: Business insights Evaluation of a Telecom client dataset using R

9

A similar relationship can be seen between the number of customer service calls

and the number of night calls.

Figure 4 - Customer Service Calls vs Total Night Calls

From both Fig.3 and Fig.4, we can infer that the total number of calls irrespective

of when it is (Day/Night/Evening) made has an influence over the number of

customer service calls the customer makes.

2.6 Optional Offerings Associations:

From the given dataset, the company has two extra optional offerings: 1. Voice

Mail plan and 2. International Plan. Here we’ll explore whether there is any

association between these two variables so that the company can plan bundling

or upselling them.

In order to find the association between these two variables, an Association rule

has been built for the same.

{International.Plan=Yes} => {VMail.Plan=Yes}

Support: 0.8053333 Confidence: 0.8981413

Page 11: Business insights Evaluation of a Telecom client dataset using R

10

And it turns out that the support is approximately 0.80 which means out of all the

customers over 80% customers prefer both together. To understand it better a

mosaic plot is drawn between the both these variables.

Figure 5 - Association between International Plan and Voice Mail Plan

A cross tabulation between those two variables can explain the customer

preference in terms of percentage.

Voice Mail Plan

International Plan

No Yes

No 0.01 0.09

Yes 0.09 0.81 Table 4 - Crosstab between VMail Plan & International Plan

Almost 81% of the customers have taken both these plans and only 9% of the

current customer base prefers just one plan.

Page 12: Business insights Evaluation of a Telecom client dataset using R

11

2.7 Factors driving Churn:

The most worrying part of any company is when their customers leave them. And

especially for a telecom company, Customer Churn can happen due to a lot of

reasons. Here we’ll explore some of the factors that can drive customer churn.

We’ll make some Null hypothesis related to Churn and will statistically reject the

Null hypothesis or we’ll accept that we failed to reject the Null hypothesis that we

made.

The biggest assumption among experts with respect to churn is that the

customers who churn out would significantly make more calls to Customer service

than the customers who stay.

Hypothesis 1:

Null Hypothesis: There is no significant difference between the number of calls

made by Customers who churn out and Customers who stay back.

Alternative Hypothesis: There is a significant difference between the number of

calls made by Customers who churn out and Customers who stay back.

In order to test this hypothesis, we can run a complete Random ANOVA

significance Test between Number of Customer service calls & Customer Churn.

fit = aov(clean$Customer.Service.Calls ~ clean$Churn, data=clean)

The summary of the model reveals that the F-Value is 0.032 and P-Value is 0.857

(greater than F-Value). At 95% Confidence level, P-Value is not less than 0.05

hence we fail to reject the Null Hypothesis. And that means, there is no significant

difference between the customer service calls made by churned out customers

and stayed back customers.

Hypothesis 2:

Null Hypothesis: The factor variable Churn is independent of the factor Voice Mail

Plan

Alternative Hypothesis: The factor variable Churn is dependent of the factor Voice

Mail Plan

Page 13: Business insights Evaluation of a Telecom client dataset using R

12

As we have to test the independence between two factor variables, we should

perform a Chi-Square Significance test to understand it.

chisq.test(table(teleco$Churn,teleco$VMail.Plan))

Output of this Chi-Square test gives a p-value of 0.9368. At 95% confidence level,

it’s proved that these two factor variables are totally independent which results in

failing to reject the Null Hypothesis.

Hypothesis 3:

Null Hypothesis: The factor variable Churn is independent of the factor

International Plan

Alternative Hypothesis: The factor variable Churn is dependent of the factor

International Plan

As we have to test the independence between two factor variables, we should

perform a Chi-Square Significance test to understand the dependence.

chisq.test(table(teleco$Churn,teleco$International.Plan))

Output of this Chi-Square test gives a p-value of 0.9807. At 95% confidence level,

this p-value is not less than 0.05 so it is proved that these two factor variables are

totally independent which results in failing to reject the Null Hypothesis.

Page 14: Business insights Evaluation of a Telecom client dataset using R

13

3. Conclusion:

Key conclusions that can made from the above data analysis:

Most of the customers are from West Virginia while the company’s

customer base in California is the least where it has to improve its customer base.

The company has the best customer retention rate of 100% in Kansas while

it has the worst churn rate of over 20% in California where it has to improve a lot

to retain its customers.

The number Customer Service Calls are directly influenced by the total

number of calls (day calls, night calls, evening calls) made by a customer hence

the company has to improve its customer service executive counts when there is

a spike in the number of calls.

There is a huge upselling option for the company as there is a strong

association between Voice Mail Plan and International Plan. For any new

customer, the company can either offer a bundle package as customers mostly

prefer both or can give the customers discounts for the second one to quickly

upsell the second plan.

There is no significant relationship between customer service calls and

Customer Churn.

Also there is no significant dependence between Customer Churn and

International plan or Voice Mail plan.

Hence it can be concluded that none of these prominent factors given in

the dataset drives Customer churn and there must be some other factor that

makes customers churn out and the company has to provide more customers

data and other variables like their joining date and leaving date for us try

predicting churn.