business intelligence using sas final presentation

Post on 08-Jul-2015

534 Views

Category:

Education

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

Business Intelligence Using SAS Final Presentation

TRANSCRIPT

Bank Marketing Project

Group 7:Zhaodi Liu

Preete Dixit

Nandini Naik

Rashmi Nadubeedi Ramesh

Pravin Kumar Prem Kumar

Agenda

• Project Motivation• Data Description• Our BI Models• Experimental Results• Association Rule mining• Managerial Implications• Challenges• Conclusion

Project Motivation

• Direct marketing targets customers directly with a personalized message as opposed to Mass marketing

• The primary benefit to businesses: – Increased lead generation– Increase sales volume – Increased customer base

– Minimize losses

• Focus on generating more "qualified" leads

Impact of Data Mining

• Can be very effective for direct marketing• Use of sophisticated algorithms generate

rules, determine the most useful attributes and predict future outcome

• Our goal is to predict the probability of a client subscribing to the term deposit

• In the interest:– To boost sales to existing customers– Increase customer loyalty– Recapture old customers and generate new

business

Data Description

• The data is related with direct marketing campaigns (phone calls) of a Portuguese banking institution

Data Set Characteristics:

Multivariate Number of Instances: 45211 Area: Business

Attribute Characteristics:

Real Number of Attributes: 17 Date Donated 2012-02-14

BANK CLIENT DATA

Serial No Name Description Data Type 1. age Client’s age numeric 2. job type of job categorical 3. marital marital status categorical 4. education level of education categorical 5. default has credit in default? binary 6. balance average yearly balance in euros numeric 7. housing has housing loan? binary 8. loan has personal loan? binary

DATA RELATED WITH THE LAST CONTACT OF THE CURRENT CAMPAIGN Serial No Name Description Data Type

9. contact contact communication type categorical 10. day last contact day of the month numeric 11. month last contact month of year categorical 12. duration last contact duration, in seconds numeric 13. campaign number of contacts performed

during this campaign and for this client

numeric

14. pdays Number of days that passed by after the client was last contacted

from a previous campaign

numeric

15. previous number of contacts performed before this campaign and for this

client

numeric

16. poutcome outcome of the previous marketing campaign

categorical

OUTPUT VARIABLE (DESIRED TARGET) Serial No Name Description Data Type

17. y has the client subscribed a term deposit

binary

BI Model - With Target Profile

Profit/Loss Matrix

Decision Tree

Regression

Regression Node: To convert categorical values to interval value using dummy variables concept.To group logically related categories in order to reduce the number of independent variables in regression equation.

Neural Network

The model converged after 70 iterations and contains 124 weights.

Neural Network with Input Selection

• Reducing the number of modelling inputs reduces the number of modelling weights as well as computational costs and possibly improves the model performance.

• The useful inputs are selected by connecting the Neural Network Node to the Regression Node.

The model converged after 42 iterations and contains 46 weights.

BI Model - Without Target Profile

Decision Tree

Regression

Neural Network The model converged after 70 iterations and contains 124 weights.

Neural Network with Input Selection The model converged after 76 iterations and contains 85 weights.

Model Assessment and Scoring results with Target Profile Model

– The performance of the four models are compared based on the average profit using the model comparison node.

Fit Statistics

ROC Plots

Confusion Matrix

Scoring

– Scoring is used to implement the model deemed best by the model comparison node for predicted the outcome for a new case/observation for which the outcome is unknown.

Replaced Variables:– Job – management

– Education – Secondary– Contact – Cellular

Rejected Variables:– Poutcome– Target y

Scoring

Scoring

Actual Data Scores:– Percentage No = 88.476%– Percentage Yes = 11.524%

There is a slight difference of 1.725% in the prediction model outcome and the actual outcome.

Model Assessment and Scoring results without Target Profile Model

– The performance of the four models are compared based on the misclassification rate using the model comparison node.

Fit Statistics

ROC Plots

Confusion Matrix

Scoring

Scoring

Scoring

Actual Data Scores:– Percentage No = 88.476%– Percentage Yes = 11.524%

There is a slight difference of 1.725% in the prediction model outcome and the actual outcome.

Association Rule Mining

Data Pre-processing

• Default : D(Yes), D(No)

• Housing : H(Yes) , H(No)

• Personnel Loan : PL(Yes),PL(No)

• Age : 20- 40, 40-60, 60-90, and 90-100

Results & Interpretation

Managerial Implicationsif pdays < 19.5 or MISSING

AND month IS ONE OF: MAY, JUN, JUL, AUG, NOV, JAN or MISSING AND duration < 348.5 or MISSING

AND age < 60.5 or MISSING then Predicted: y=YES = 0.02 Predicted: y=NO = 0.98 A total of 100 customers who and the cost of calling a customer is $12 then there will be a saving $1200 just by not contacting these set of customers.

if pdays < 19.5 or MISSING

AND month IS ONE OF: FEB AND housing IS ONE OF: NO

AND duration < 466.5 or MISSING AND day < 20.5 AND day >= 9.5

AND age < 60.5 or MISSING then Predicted: y=YES = 0.75 Predicted: y=NO= 0.25 A total of 100 customers and cost of calling a customer is $12 and if the profit is $100 then the Bank could generate revenue of $10,000.

Decision Tree Model – Our best-fit model for maximizing profits

Decision Tree

Predicted

Positive Negative

Actual Positive 1314 1330

Negative 863 19098

Predicted

Positive Negative

Actual Positive $15,768 $0

Negative -$10,356 $0

Regression Model• If the pdays increase by 1-unit then it has

absolutely no impact on the odds of not subscribing to the term deposit.

��.���𝟖𝟗� ≈ �

Challenges

• To implement the Profit/Loss matrix

• Absence of ROC plot in the result of Model Assessment

• Non-convergence of Neural Network

• Scoring

Conclusion

• Successfully implemented 2 predictive analysis models to predict the outcome of term deposit subscription

• Decision Tree best fit-model based on Profit/Loss• Decision Tree best-fit model based on

Misclassification Rate• Using the Decision rules

– Results in a saving of $1200 – Generates a revenue of $10,000

• Using Profit/Loss Matrix– Profit of $15,768– Savings of $10,356

Q&A

top related