science in business data mining?

5
Science in Business Data Science in Business Data Mining? Mining? Background: support managerial decision making Background: support managerial decision making Is there a science to data mining (with CI- Is there a science to data mining (with CI- methods)? methods)? Outline Outline 1. 1. Data Mining in Business & Management Data Mining in Business & Management 2. 2. Rules established in Business practices vs. Data Rules established in Business practices vs. Data mining? mining? 1. 1. Statistics vs. Data driven modelling Statistics vs. Data driven modelling 2. 2. A personal view A personal view 3. 3. How do develop meta-knowledge How do develop meta-knowledge Sven F. Crone, Lancaster University Management Scho Research Centre for Forecasti YES YES , but it , but it depends depends (and it may be empirical Wizardry (and it may be empirical Wizardry driven by efficiency rather than driven by efficiency rather than effectiveness!) effectiveness!)

Upload: grover

Post on 09-Jan-2016

35 views

Category:

Documents


1 download

DESCRIPTION

Science in Business Data Mining?. Background: support managerial decision making Is there a science to data mining (with CI-methods)? Outline Data Mining in Business & Management Rules established in Business practices vs. Data mining? Statistics vs. Data driven modelling A personal view - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Science in Business Data Mining?

Science in Business Data Science in Business Data Mining?Mining?

Background: support managerial decision makingBackground: support managerial decision making

Is there a science to data mining (with CI-methods)? Is there a science to data mining (with CI-methods)?

OutlineOutline1.1. Data Mining in Business & ManagementData Mining in Business & Management2.2. Rules established in Business practices vs. Data Rules established in Business practices vs. Data

mining?mining?1.1. Statistics vs. Data driven modellingStatistics vs. Data driven modelling2.2. A personal viewA personal view

3.3. How do develop meta-knowledgeHow do develop meta-knowledgeSven F. Crone,

Lancaster University Management SchoolResearch Centre for Forecasting

YESYES, but it , but it dependsdepends

(and it may be empirical Wizardry (and it may be empirical Wizardry driven by efficiency rather than driven by efficiency rather than

effectiveness!)effectiveness!)

Page 2: Science in Business Data Mining?

Business Data Mining?Business Data Mining?

Main areas for Data Mining:Main areas for Data Mining: Finance: Finance: Credit risk (personal & corporate)Credit risk (personal & corporate) Marketing:Marketing: Customer Relationship ManagementCustomer Relationship Management

(=Direct Marketing, Database Marketing)(=Direct Marketing, Database Marketing)Sven F. Crone,

Lancaster University Management SchoolResearch Centre for Forecasting

Established CustomerNew Customer Former CustomerProspect

IndividualDemand

TargetMarket

New Customer

Inital Customer

High valueCustomer

High potential Customer

Low valueCustomer

VoluntaryChurn

Forced Churn

Aggregate Demand

Acquisition Activation

Resignation

RetentionRelationship

Marketing Response

Adoption

Market Experiments

Intentions

Extrapolative Forecasting

(incl. Judgement)Churn Prediction

Credit Scoring DirectMarketing

adapted from Berry and Linoff (2004) and Olafson et al (2006)

Page 3: Science in Business Data Mining?

Best practicesBest practicesCredit ScoringCredit Scoring

Small & Balanced classesSmall & Balanced classes Use 2000 of minority classUse 2000 of minority class Use undersamplingUse undersampling

Discretise all (!) variablesDiscretise all (!) variables Binary dummies / WOE to Binary dummies / WOE to

capture non-linearitycapture non-linearity Use Logistic regressionUse Logistic regression

Cross-SellingCross-Selling

Large & imbalanced Large & imbalanced samplesample Use large sample sizesUse large sample sizes Original (Imbalanced) Original (Imbalanced)

class distributionclass distribution ……

A personal view:A personal view:• Data selection is best using prior domain knowledge (use Data selection is best using prior domain knowledge (use

filters)filters)• Pre-processing more important than method Pre-processing more important than method [Crone et al, 2006; [Crone et al, 2006;

Keogh 2002]Keogh 2002]

• (Balanced) sampling & pre-processing is method (Balanced) sampling & pre-processing is method dependentdependent

• Best practices exist & are domain dependentBest practices exist & are domain dependent(e.g. homogeneous datasets in credit scoring)(e.g. homogeneous datasets in credit scoring)

• Flat Maximum effect Flat Maximum effect [Lovie & Lovie, 1986][Lovie & Lovie, 1986]

Sven F. Crone, Lancaster University Management School

Research Centre for Forecasting

GAPGAP

Extensive use of expert domain

knowledge efficient solution ≠

best

Practitioners & Consultants use statistics

Page 4: Science in Business Data Mining?

How do derive (meta)-How do derive (meta)-knowledge?knowledge?

Lessons from other disciplines: Time Series ForecastingLessons from other disciplines: Time Series Forecasting More ‘Evidence based methods” More ‘Evidence based methods” [Armstrong 2000][Armstrong 2000]

Empirical EvidenceEmpirical Evidence Conditions under which methods perform well (multiple hypothesis)Conditions under which methods perform well (multiple hypothesis)

Domain specific Competitions (valid & reliable)Domain specific Competitions (valid & reliable) Multiple out-of-sample evaluations (≠ single fold, one origin)Multiple out-of-sample evaluations (≠ single fold, one origin) Multiple homogeneous datasets from one domainMultiple homogeneous datasets from one domain Use of valid benchmark methods & unbiased error measuresUse of valid benchmark methods & unbiased error measures Honour the domain & decision context (active learning, cost sensitive)Honour the domain & decision context (active learning, cost sensitive)

ReplicationsReplications Studies must allow replications – document all steps / parametersStudies must allow replications – document all steps / parameters

STOP FINE-TUNING / MARGINAL EXTENSION OF STOP FINE-TUNING / MARGINAL EXTENSION OF SINGLE METHOD ON SINGLE TOY DATASETSINGLE METHOD ON SINGLE TOY DATASET

Develop solutions for domain Develop solutions for domain (Why make life harder?)(Why make life harder?)

Where to start? Where to start? follow high impact approach! follow high impact approach! Identify most prominent application domains (e.g. credit risk)Identify most prominent application domains (e.g. credit risk) Select promising application domains for CI-methodsSelect promising application domains for CI-methods Get corporate sponsor & run competitionGet corporate sponsor & run competition Analyse conditions (!) using meta-studies!Analyse conditions (!) using meta-studies! Embed findings as methodology in SOFTWAREEmbed findings as methodology in SOFTWARE Sven F. Crone,

Lancaster University Management SchoolResearch Centre for Forecasting

Page 5: Science in Business Data Mining?

LiteratureLiterature Ian Ayres (2007) Super Crunchers: Why Ian Ayres (2007) Super Crunchers: Why

Thinking-by-Numbers Is the New Way to Thinking-by-Numbers Is the New Way to Be Smart, BantamBe Smart, Bantam

Thomas H. Davenport, Jeanne G. Harris Thomas H. Davenport, Jeanne G. Harris (2007) Competing on Analytics: The New (2007) Competing on Analytics: The New Science of Winning, Science of Winning, Harvard Business Harvard Business School PressSchool Press

Fildes, Nikolopoulos, Crone, Synthetos (2009) Fildes, Nikolopoulos, Crone, Synthetos (2009) Forecasting and Operational Research – a Review, JORS, Forecasting and Operational Research – a Review, JORS, forthcomingforthcoming

Finlay, Crone (under review), Sampling issues in Credit Finlay, Crone (under review), Sampling issues in Credit Scoring – the effect of sample size and sample Scoring – the effect of sample size and sample distribution on predictive accuracy, EJORdistribution on predictive accuracy, EJOR

Keogh, Kasetty (2002, 2004) Keogh, Kasetty (2002, 2004) On the Need for Time On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Series Data Mining Benchmarks: A Survey and Empirical Demonstration, SIGKDD’02 & Data Mining JournalDemonstration, SIGKDD’02 & Data Mining Journal