baqmar 2008

158

Upload: baqmar

Post on 07-Nov-2014

8.186 views

Category:

Technology


4 download

DESCRIPTION

BAQMaR Conference 2008

TRANSCRIPT

Page 1: BAQMaR 2008
Page 2: BAQMaR 2008
Page 3: BAQMaR 2008
Page 4: BAQMaR 2008

600+ analysts

Page 5: BAQMaR 2008

Our dreams

Page 6: BAQMaR 2008

Inspiration

Page 7: BAQMaR 2008

Friend organisations

Marketing research domains

Neighbour countries

Online & offline

Page 8: BAQMaR 2008

website!

Page 9: BAQMaR 2008

Jointhe conversation !

Page 11: BAQMaR 2008

C L

PopularUnique

Well known

Page 12: BAQMaR 2008
Page 13: BAQMaR 2008
Page 14: BAQMaR 2008

An explosion of data !

Page 15: BAQMaR 2008
Page 16: BAQMaR 2008

Fatigue among respondents !

Page 17: BAQMaR 2008
Page 18: BAQMaR 2008

Me, MySpace & IVisual (n) ethnography among 13-17 year olds

Joeri Van den Bergh (InSites Consulting)

Veerle Colin (MTV Networks)

Page 19: BAQMaR 2008
Page 20: BAQMaR 2008

WHY TELL ME WHY

The Research Briefing

Page 21: BAQMaR 2008

Getting intimate with our target groups

ID construction of adolescents in a

digitalised world

Social groups as an extension of

psychographic segments

Role of brands in ID construction within

social groups

Page 22: BAQMaR 2008

I SEE YOU, BABY

The Research Approach

Page 23: BAQMaR 2008

Hawthorne effect

Researcher

gazeTime & Cost

intensive

Traditional

ethnography

context

Participant

R

Page 24: BAQMaR 2008

Hawthorne effect

Researcher

gazeTime & Cost

intensive

Traditional

ethnography

context

Participant

R

360° visual

ethnography

Contact with researcher

via 2.0 tools“Informer” gaze: what is

important?Follow multiple participants from anywhere

R

Participant

context

From traditional to visual ethnography

Page 25: BAQMaR 2008

1.User generated MM ethnography

= Observation takes place via photos/video

taken by the participants to the study

= Participants observe their own

environment & report back to the

researcher

360° Ethnography

Page 26: BAQMaR 2008
Page 27: BAQMaR 2008

Personal mePictures of...

Place where you can

really be yourself

Clothes you wear at

home

Objects that are

typical for me

Non mePictures of...

Clothes you no

longer want to

wear

Youngsters where

you so not want to

be friend with

Aspirational me

Pictures of...

Favorite clothes

youngsters where

you would like to

be friend with

Social mePictures of...

Important persons

in your life

Clothes you wear

to go out

Groups of

youngsters you like

Other groups

Pictures of other groups that

are different but okay

Pictures of youth of today

Pictures of normal persons

Ethnographic blog: identity related tasks

Page 28: BAQMaR 2008

1.User generated ethnography

= Observation takes place via photos/video taken

by the participants to the study

= Participants observer their own environment &

report back to the researcher

2.Nethnography: social life of teenagers is very

much MOVING ONLINE !!!

= observation of the online behaviour & content

of a target group or within a certain webspace

360° Ethnography

Page 29: BAQMaR 2008

Nethnography: we are your friends

Page 30: BAQMaR 2008

Personal

identity

Nicknam

e

Profile

text

Profile

picture

Photo

collection

Clan member

ship

Conversation

Monitoring on

guestbook

Social

identity

Social Network Sites: nethnography

Page 31: BAQMaR 2008

And for the datamining freaks among you, Annelies ripped the internet

• 300 active participants of netlog randomly

selected. Equal spread age x gender

• Webcrawlers to „scan‟ pages of netlog and

substract content.

• Textmining: profile pages – photo tags –

clan membership – conversations on the

guestbook

Page 32: BAQMaR 2008

Other online behaviour: tracking tool

Page 34: BAQMaR 2008

THE HARD PART

The Research Analysis

Page 35: BAQMaR 2008

Analysis

INTIMACY

Page 36: BAQMaR 2008
Page 37: BAQMaR 2008

ME-SEUM

Page 38: BAQMaR 2008

Social Networks are not so social as you think they are

Only 32% of the conversations on the guestbook

are „interactive‟!

The rest are all single statements.

Feedback on picture

Congratulations

School

Confirm friendship

MSN

Express love

Miss you

Practical appointment

It was great

Party

Making fun

Family

Study

How are you?

Age

Music

Online movies

Transport

I’ am bored

Feedback on profiles

Festivals

Sport

Mobile phones

Welcome by strangers

Youth movement Alcohol

Food

Litte kids

Quarrel

Gaming

Food

Sleeping

Tokio hotel

Travel

Cars

Shoes

Page 39: BAQMaR 2008

one of the 3 key research questions

Social groups among today‟s youngsters

Page 40: BAQMaR 2008

To what

youngster

group

do I belong?

Page 41: BAQMaR 2008

Methodology

Aspira

tional

Group

Social

Group

Differe

nt but

OK

Non

group

M

E

Page 42: BAQMaR 2008

WE ME (I’m better)T

HIN

KD

RIN

KCHANGE

CONSERVATISM

REAL WORLD

FANTASY

Fashion girlFashion boy

Breezer sluts

Jumpers

Nerds

Geek girls

emo

Alternative

Punk

Gothic

RockersTektonic

Skater

Hippies

Rapper

MAINSTREA

M

Page 43: BAQMaR 2008

spartacus121 best pk (runescape)

30/04/2008 19:07:55 door laurent

ik hit iets in de 27 en dat is al zot hoog zij hits zijn

gestoord lol. zen zwaard is 140M (ik heb 10 M)

Page 44: BAQMaR 2008

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS SKILLS

SKILLS

SKILLS

SKILLS

SKILLSSKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS SKILLS

SKILLS

SKILLS

SKILLS

SKILLSSKILLS

SKILLS

SKILLS

WE ME (I’m better)T

HIN

KD

RIN

KCHANGE

CONSERVATISM

REAL WORLD

FANTASY

Fashion girlFashion boy

Breezer sluts

Jumpers

Nerds

Geek girls

emo

Alternative

Punk

Gothic

RockersTektonic

Skater

Hippies

Rapper

Page 45: BAQMaR 2008

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS LOOKS

LOOKS

LOOKS

LOOKS

LOOKSLOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS LOOKS

LOOKS

LOOKS

LOOKS

LOOKSLOOKS

LOOKS

LOOKS

WE ME (I’m better)T

HIN

KD

RIN

KCHANGE

CONSERVATISM

REAL WORLD

FANTASY

Fashion girlFashion boy

Breezer sluts

Jumpers

Nerds

Geek girls

emo

Alternative

Punk

Gothic

RockersTektonic

Skater

Hippies

Rapper

Page 46: BAQMaR 2008

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS LOOKS

LOOKS

LOOKS

LOOKS

LOOKSLOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS LOOKS

LOOKS

LOOKS

LOOKS

LOOKSLOOKS

LOOKS

LOOKS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS SKILLS

SKILLS

SKILLS

SKILLS

SKILLSSKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS SKILLS

SKILLS

SKILLS

SKILLS

SKILLSSKILLS

SKILLS

SKILLS

WE ME (I’m better)T

HIN

KD

RIN

KCHANGE

CONSERVATISM

REAL WORLD

FANTASY

Fashion girlFashion boy

Breezer sluts

Jumpers

Nerds

Geek girls

emo

Alternative

Punk

Gothic

RockersTektonic

Skater

Hippies

Rapper

Page 47: BAQMaR 2008
Page 48: BAQMaR 2008

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS LOOKS

LOOKS

LOOKS

LOOKS

LOOKSLOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS

LOOKS LOOKS

LOOKS

LOOKS

LOOKS

LOOKSLOOKS

LOOKS

LOOKS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS SKILLS

SKILLS

SKILLS

SKILLS

SKILLSSKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS

SKILLS SKILLS

SKILLS

SKILLS

SKILLS

SKILLSSKILLS

SKILLS

SKILLS

WE ME (I’m better)T

HIN

KD

RIN

KCHANGE

CONSERVATISM

REAL WORLD

FANTASY

Fashion girlFashion boy

Breezer sluts

Jumpers

Nerds

Geek girls

emo

Alternative

Punk

Gothic

RockersTektonic

Skater

Hippies

Rapper

Page 49: BAQMaR 2008

REMEMBER ME

How this research changed our lives

Page 50: BAQMaR 2008

6 changes that rocked our socks off

• The tools for ID construction have changed

• New online quali methods proved to be efficient

• Our target group = the new and better quali researchers

• New reflection kit for entire MTV Networks staff

• Closer connection with MTV & new clients

• Redefine content strategy of TMF on screen & on line

Page 51: BAQMaR 2008
Page 52: BAQMaR 2008
Page 53: BAQMaR 2008
Page 54: BAQMaR 2008

4C Consulting

Introduction to our services

Page 55: BAQMaR 2008

Our Mission | Boosting your customer value

4C Consulting

helps companies

win, keep and grow

customer value

Page 56: BAQMaR 2008

Our Solutions | Call us for…

Customer Value

Strategy

Process

Excellence

Customer

Insight

Business

requirements

definition

Package selection

& implementation

Post-launch

care

Page 57: BAQMaR 2008

1. Acquire new customers

2. Sell more to existing customers

• More of the same (increase turnover)

• Expand value proposition portfolio

(cross-sell & product development *)

• Upgrade value

proposition (Upsell)

3. Prevent existing

customers from leaving

1. Efficient Delivery (Process

Excellence)

2. Align value propositions

3. Advanced pricing

Our Focus | Boosting your customer value

Increase Revenue Reduce Costs

Page 58: BAQMaR 2008

Business Intelligence Practice

SCV ROMSCCI Competing on Analytics

Audit

Infrastruc-tuur

Data Quality

Exploitatie Coaching

Page 59: BAQMaR 2008

Why 4C Consulting | 7 compelling reasons

1. 100% focus on customer value

management

2. Result-oriented project approach

3. Connecting marketing, sales &

customer care with senior management

and IT

4. Independent consultant for 10 years

5. Experienced crew, passionate about

marketing, sales & customer care

6. Value based pricing model

7. Satisfied & loyal customers: 90

customers, more than 380 projects

Page 60: BAQMaR 2008

60

Page 61: BAQMaR 2008

Optimize your business with Business

&Decision

Michel Meulders

- Domain Manager -

Page 62: BAQMaR 2008

Business & Decision Benelux

Business & Decision Benelux is :

• a multi-specialist, in specific

technology fields :

• Business Intelligence

• Customer Relationship Management

• Life sciences

• Risk & Compliance

• with foreign offices in Brussels,

Amsterdam, Luxembourg

• Top accounts in finance, pharma,

telco, distribution, industry (Fortis, ING,

ABN Amro, Dexia, GSK, UCB, Proximus,

Belgacom, Carrefour, Honda,…)

Founded in 2002

Merger of several companies specialised in BI

Consulting & System Integrator :

- More than 300 consultants

- About 18 mio Euro turnover in 2007

- 58% organic growth comparing to 2006

- Last acquisition : BnV Group (BE+NL)

Turnover evolution (consolidated)

0

5000

10000

15000

20000

25000

2004 2005 2006 2007 Obj 2008

Th

ou

sa

nd

s

Belgium Luxembourg Netherlands

Page 63: BAQMaR 2008

For more info see http://www.businessdecision.com

Page 64: BAQMaR 2008

Belgian Federation of

Market Research Institutes

www.febelmar.be

Page 65: BAQMaR 2008
Page 66: BAQMaR 2008
Page 67: BAQMaR 2008

Febelmar mission

Development and promotion of market

research and opinion polls in Belgium

Protecting the sector interests

Watching over correct use of deontological

rules of market research in all phases of the

market research process

Stimulating continuous improvement of

quality of service in market research

Being a platform for communication,

exchange of expertise and networking

Page 68: BAQMaR 2008

Members

27 agencies.

Together they represent about 75% of

the total market research expenditures in

Belgium.

Page 69: BAQMaR 2008
Page 70: BAQMaR 2008
Page 71: BAQMaR 2008

What is I4BI?

i4bi is specialized in implementing Business Intelligence Solutions in

your company.

Our team of BI experts has deep functional and technical experience

with Application development, Business Model definitions and Data

Warehousing. Functional/technical designs, development, application

role out, training etc… are phases in a project where our consultants

have many years of experience.

i4bi consultants have a deep knowledge of the Oracle Business

Intelligence products and solutions.

i4bi sponsors the development of an independent analytical branch,

which will probably see the light in 2009

Page 72: BAQMaR 2008

What do we provide?

We Provide

Business

Experience

Analytical

Expertise

To Support

Technical

Abilities

Strategic Decision Making

Page 73: BAQMaR 2008

Expertise

• Analytical Expertise• Data Mining

• Statistical modelling

• Predictive analysis

• Basel II compliant modelling

• Forecasting

• Business Analysis Expertise• Reporting

• Delivering Business Insight to decision makers

• Marketing Analysis

• Data Quality

• Use of technical tools such as SAS – SPSS – Statistica to

support & extend business knowledge

Page 74: BAQMaR 2008

Contact

• For more general information:

www.I4BI.be

• For more analytical information:

[email protected]

Page 75: BAQMaR 2008

InSites Consulting

6 beliefs in 60 seconds

Page 76: BAQMaR 2008

“Consumers are beginning in a very real sense to own our brands and participate. We need to begin to learn how to let go”

A.G. Lafley, CEO & Chairman of P&G

We believe ...in the empowered consumer

Human-to-human interactions are more powerful than ever and can make or break your brand

Page 77: BAQMaR 2008

We believe ...in giving back

Rewarding experiences for participantsActive involvement of panel members

Charity contributions

Page 78: BAQMaR 2008

We believe ...in connecting

Everything we do is aimed at strengthening connectionsbetween you, your market and us

“Connected Research” brings you closer to your market and taps into the wisdom of the crowds

Some of our connected research methods

Research communitiesBulletin boardsBlog research

Online discussion groups

More information: http://connectedresearch.insites.eu/

Page 79: BAQMaR 2008

We believe ...in the power of new research methodsfor better marketing decision making

InformationalProviding more depth to research insights

TransformationalDoing things that were previously not possible

AutomationalConducting research more efficiently

Page 80: BAQMaR 2008

We believe ...in 1 + 1 = 3

Old and new methods need to be optimally “fused” in order to fully grasp the new customer / consumer reality

Page 81: BAQMaR 2008

We believe ...in the power of our team

People make the differenceOpen, forward thinking, dedicated, passionate

Specific knowledge centers

Page 82: BAQMaR 2008

©Keyrus – all rights reserved

BAQMaR, 17 December 2008

Welcome to Keyrus

performance management consulting technology

Page 83: BAQMaR 2008

About Keyrus (Belgium)

• founded in 1996 as SOLIDPartners

• focus on performance management, business intelligence & data warehousing

• strong and balanced client base spread over different industries

• +100 consultants specialised in both technical and business domains

• part of Keyrus group (France)

Page 84: BAQMaR 2008

Keyrus‟ global footprint

head office in Paris

present in 9 countries

listed on Paris stock

exchange Euronext

+1300 employees

Page 85: BAQMaR 2008

Vision & mission

Keyrus will be one of the few leading service providers

in the area of performance management.

We help our clients to effectively design, build and operate

the adequate performance management organization and solutions

in an integrated end-to-end fashion.

Page 86: BAQMaR 2008

Portfolio of solutions & services

Information Management

IM

Business Intelligence Platforms

BIP

Analytic Applications

AA

People and Processes

P&P

Corporate Performance Management

(C)PM

data warehouse & data marts

dat

a d

eliv

ery

&

ou

tflo

w f

un

ctio

ns

CPM data delivery, exchange & synchronization

info

rmat

ion

m

anag

em

en

t fu

nct

ion

s

sou

rce

sys

tem

s &

ap

plic

atio

ns

BI L

aye

r (r

ep

ort

s, O

LAP,

d

ash

bo

ard

s, a

lert

s)

CPM Applications (e.g. planning, ABM, PA)

Analytic Applications (e.g. data mining)

Page 87: BAQMaR 2008

Keyrus [email protected]

Nijverheidslaan 3/2 B-1853 Strombeek-Bever

t +32 2 706 03 00f +32 2 706 03 09

Contact us

www.keyrus.be

17-Dec-2008performance management consulting technology

Page 88: BAQMaR 2008

Introduction of Profacts

Page 89: BAQMaR 2008

Who is PROFACTS ?

We are „the new kids on the block‟

in (online) market research ...

Page 90: BAQMaR 2008

Q4

2006

Q4

2008

Q1

2007

Q2

2007

Q3

2007

Q4

2007

Q1

2008

Q2

2008

Q3

2008

2people have

founded Profacts

6people are now

working @ Profacts

200%growth rate

REVEALING FACTORS FOR SUCCESS1 strategy

28yrs

mean age

Page 91: BAQMaR 2008

Profacts is active in more then 10 sectors ...

REVEALING FACTORS FOR SUCCESS

AUTOMOTIVE

GPS

TELECOM ENERGY

ICT PHARMACEUTICAL

FMCG

RECRUITMENT

INSURANCESBANKING

Page 92: BAQMaR 2008

Python Predictions

Page 93: BAQMaR 2008

Python Predictions

PREDICT

Page 94: BAQMaR 2008

Python Predictions

Page 95: BAQMaR 2008

Python Predictions

GROWTH

Page 96: BAQMaR 2008

Python Predictions

www.pythonpredictions.com

Page 97: BAQMaR 2008

Rogil Research

A research agency with a view

Page 98: BAQMaR 2008

MARKETING & SENSORY RESEARCH

OUR PASSION

FTF research (Mobile Unit)

Telephone research

Online research

Panel services

Fieldwork in Europe

Sensory research

Eye-tracking / Eye|watch

Tachistoscope

Trained panel

Consumer panel

Taste lab

Page 99: BAQMaR 2008
Page 100: BAQMaR 2008

Sensory Safari

Note down in your agenda

SENSORY SAFARI

• March 26th 2009

• 18u

• At Rogil in Leuven

Page 101: BAQMaR 2008

Sensory Safari

5 SENSES MARKETING

Page 102: BAQMaR 2008

We hope to welcome you in March.

Thanks for your attention !

Page 103: BAQMaR 2008

SAS Analytics For Challenging Times

Start Focused, Think Wide

Page 104: BAQMaR 2008

Campaign Managment Requires

Optimization

Page 105: BAQMaR 2008

CRM is becoming Risk Managment

Page 106: BAQMaR 2008

SAS Breadth of Analytic Offering…

• Statistical Analysis

• Survey Design/Analysis

• Data Mining

• Text Mining

• Time Series Mining

• Forecasting

• Quality Improvement

• Operations Research

Page 107: BAQMaR 2008

SAS Innovations in Marketing

Solutions….

Page 108: BAQMaR 2008

Copyright © 2006, SAS Institute Inc. All rights reserved.

http://www.sas.com/feature/analytics/index.html

Page 109: BAQMaR 2008

The Mission

Drive the widespread use of

data in decision making

Page 110: BAQMaR 2008

The Focus

Attract Grow Fraud RiskRetain

Driving and Maximizing Profit

Page 111: BAQMaR 2008

The Vision

Behavioral dataDescriptive data

Attitudinal dataInteraction data

Enterprise

Data

Sources

Operational

Processes

Operational

Processes

Enterprise

Data

Sources

Page 112: BAQMaR 2008

The Acceptance

• The rise of the agnostics

Science vs. Chance

In numbers we trust!

Page 113: BAQMaR 2008
Page 114: BAQMaR 2008
Page 115: BAQMaR 2008
Page 116: BAQMaR 2008

The myth of the „best‟ algorithm

lessons learned from innovations in

data sampling and data pre-processing

for marketing analytics

Dr. Sven F. CroneDeputy Director, Ass. Prof.

Page 117: BAQMaR 2008

Associated Experts

Prof. Paul Goodwin

Dr. Andrew Eaves

Research & PhD students

Heiko Kausch, RA

Stavros Asimakopoulos

Xi Chen

Bruce Havel

Suzi Ismail

Nikolaos Kourentzes

Ioannis Stamatopoulos

Andrey Davidenko

Charlotte Brown

Hong Juan Liu

T Hu

John Prest

Huang Tao

Visiting Researchers

Prof. Geoff Allen

Dr. Yukun Bao

Young-Sang Cho

Directors

Prof. Robert Fildes

Prof. Peter Young

Dr.Sven F. Crone

Researchers

Dr. Steve Finlay

Dr. Alastair Robertson

Dr. Didier Soopramanien

Dr. Kostas Nikolopoulos

Prof. Stephen Taylor

Dr. Wlodek Tych

Prof. David Peel

Prof. Peter Pope

Page 118: BAQMaR 2008

“Take away this pudding, it has no

theme.” Sir Winston Churchill (1915)

Page 119: BAQMaR 2008

• Sampling issues in Data Mining

• Case study 1: Direct Marketing• Cross-selling of Magazine subscriptions

• Effect of data preprocessing: Sampling

• Interaction of Sampling with Scaling & Coding

• Case study 2: Credit & Behavioral Scoring• Predicting consumer credit default

• Effects of sample size

• Effects of sample distribution

• Case study 3: Online Shopping Behaviour• Predicting consumer shopping channel choice

• Sample distribution & multiple classes

• Conclusion & Take-aways

Agenda

Page 120: BAQMaR 2008

Why (Under/Over) Sampling?

• Knowledge Discovery (KDD) = non-trivial process of identifying valid, novel, useful patterns in large data sets• Data Mining = only one single step in the KDD process• Data sample determines the whole process! ( GIGO)• “Research seems preoccupied with algorithms” [Hand 2000]

Monitoring

CRISP-DM Process

SAS SEMMA DM-Process

Page 121: BAQMaR 2008

Sampling in Direct Marketing Literature?

Input

type* Methods***

Paramete

r tuning

Data reduction** Data projection

Feature

Selection

Re-

sampling

Continuous attributes Categories

Standardisation Discretisation Coding

[2] 2 BMLP, LR, LDA, QDA X X

[42] 1 MLP, LR, CHAID X X

[43] 2 MLP, RBF, LR, GP, CHAID X X

[44] 3 MLP, LR, LDA X X

[4] 2 CHAID, CART X

[6] 2 MLP, LR X X X X X

[9] 2 LVQ, RBF, 22 DT, 9 SC X X

[45] 2LDA, LR, KNN, KDE, CART, MLP,

RBF, MOE, FAR, LVQX X

[3] 1 MLP X X

[7] 2 LSSVM X X X

[11] 2 LR, LS-SVM, KNN, NB, DT X X X

[10] 1LDA, QDA, LR, BMLP, DT, SVM,

LSSVM, TAN, LP, KNNX X

[46] 2 LR, MLP, BMLP X X

[47] 2LSSVM, SVM, DT, RL, LDA, QDA,

LR, NB, IBLX X

[48] 1 DT, MLP, LR, FC X

[49] 1 FC X X

Majority of direct marketing papers focus on algorithm tuning

Only 3 papers consider Resampling / Instance Selection

No analysis of the interaction with Sampling & Projection & …

Page 122: BAQMaR 2008

Database of customers (instances)

Known attributes for all customers (age, gender, existing subscriptions, …)

Known response (class membership) of buyers & non-buyers from past mailings

Build a model to separate classes decision boundary of different complexity

1 … Number of subscriptions … Many

Classification

Few

… D

ays s

ince last

purc

hase

… M

any

No responseSubscribed to magazine

Last campaign

Page 123: BAQMaR 2008

Class unknown

Use the decision boundary to classify unseen instances

Calculate on which side of hyperplane the instances lie (or distance)

Assign class to unseen instances

No responseSubscribed to magazine

Classification

1 … Number of subscriptions … Many

Few

… D

ays s

ince last

purc

hase

… M

any

Page 124: BAQMaR 2008

Balanced dataset = class distributions are equal P(x|y=A)=P(x|y=B)

proportional sampling or stratified sampling feasible

Imbalanced dataset = class distributions unequal P(x|y=A)>>P(x|y=B) `

The class of interest is often the minority (in most business applications)

Reality Check: Imbalanced classes

No responseSubscribed to magazine

Problem

• Classifiers are biased towards

the majority class

• Shifts the decision boundary

• Error / Accuracy based learning

creates naïve classifiers

• Invalid separation of classes1 … Number of subscriptions … Many

Few

… D

ays s

ince last

purc

hase

… M

any

Page 125: BAQMaR 2008

Size of the sample?

Distribution / location of the sample?

Imbalanced Data Sampling

No responseSubscribed to magazine

Stratified Random Sampling

divide DB in mutually exclusive

strata (subpopulations) & draw

random samples from each

Proportional

assure proportions in samples

equal those in population

Disproportional

weighted over-& undersampling

of important classes1 … Number of subscriptions … Many

Few

… D

ays s

ince last

purc

hase

… M

any

Page 126: BAQMaR 2008

Exclude random instances of the majority class

Retain all instances of the minority class

Establish a balanced class distribution

Random Undersampling

No responseSubscribed to magazine

Benefits

• Helps detect rare target levels

Risks

• Biases predictions (correctable)

• Looses information contained in

instances of the majority class

• Creates different boundaries

• Increases prediction variability

• …1 … Number of subscriptions … Many

Few

… D

ays s

ince last

purc

hase

… M

any

Page 127: BAQMaR 2008

Retain all instances of the majority class in the sample

Duplicate identical instances of the minority class

Establish a balanced class distribution

Random Oversampling

No responseSubscribed to magazine

Benefits

• Helps detect rare target levels

• No loss of information

Risks

• Biases predictions (correctable)

• Increases prediction variability

• Increases processing time

1 … Number of subscriptions … Many

Few

… D

ays s

ince last

purc

hase

… M

any

Page 128: BAQMaR 2008

rather some case studies ...!

Ready for more theory…?

x

Page 129: BAQMaR 2008

• Sampling issues in Data Mining

• Case study 1: Direct Marketing• Cross-selling of Magazine subscriptions

• Effect of data preprocessing: Sampling

• Interaction of Sampling with Scaling & Coding

• Case study 2: Credit & Behavioral Scoring• Predicting consumer credit default

• Effects of sample size

• Effects of sample distribution

• Case study 3: Online Shopping Behaviour• Predicting consumer shopping channel choice

• Sample distribution & multiple classes

• Conclusion & Take-aways

Agenda

Page 130: BAQMaR 2008

• Sell a magazine subscription to existing customers

• Whom to send mail to? (Which customers are most likely to respond?)

• How many customers to contact? (What is the optimal mailing size?)

Corporate project with leading German Publishing HouseProvided data set of past mailing campaigns

Benchmark novel methods against in-house SPSS Clementine

Explore Neural Networks (NN) an Support Vector Machines (SVM)

Business Case:

Direct Marketing/Response Optimization

Page 131: BAQMaR 2008

Smaller mailing (number of letters sent) lower costs (Euro 1.- per letter)

Higher response rate higher revenue

More specific mailing lower cost

More relevant information higher customer satisfaction

Benefits of Direct Marketing

Simple With data mining

Addressees 100.000 Top 40% = 40.000

Cost 2€/mail = 200.000€ 2,5€/mail = 100.000€

Response rate 0,5% = 500 1,0% = 400

Sales volume 300€ 300€

Sales volume 150.000€ 120.000€

Revenue -50.000€ 20.000€

Page 132: BAQMaR 2008

NN get worse with learning …

%Pred. C 0

Pred. C 1

Sum

C 0 72.96 27.04 100

C 1 62.02 37.98 100

134.98 65.02 55.47

%Pred. C 0

Pred. C 1

Sum

C 0 52.87 43.37 100

C 1 47.13 56.63 100

100 100 54.75

%PredC 0

PredC 1

Sum

C 0 61.86 38.14 100

C 1 55.09 44.81 100

116.95 82.95 54.26

• Wish to implement Neural Networks for next campaign• In-house team (with no NN knowledge) outperformed us EVERY TIME!• Analyzed software, training parameters, etc. internal competition• Observed expert in building models … !

Page 133: BAQMaR 2008

Scale numerical

features

Adjust imbalanced

class distributions

Decide on sample

size and method

Experimental Design:

Different data pre-processing

Handle categorical

features

Select useful

features

Handle outliers

Different SamplingOver-& Undersampling

Different Encodingn, n-1, thermo, ordinal

Different ScalingDiscretise, Standardise

Evaluate across 3 algorithms:

Neural Networks (MLPs), Support Vector Machines & Decision Trees

Page 134: BAQMaR 2008

Multifactorial design to evaluate impact across multiple methods

Neural Networks (NN)

Support Vector Machines (SVM)

Decision Trees (CART)

Dataset Structure

Data set size• 300,000 customer records• 4,019 subscriptions sold• Response rate of 1.3%

Data set structure• 18 categorical features• 35 numerical features• Binary target variable

Evaluated the Impact of Data Preprocessing

• Data Sampling (over sampling vs. undersampling)

• Categorical attribute Encoding (N, N-1, thermo, ordinal)

• Continuous attribute Projection (Binning vs. Normalisation)

• Continuous attribute Scaling ( [0,+1] vs. [-1,+1] range)

Page 135: BAQMaR 2008

Different balancing in the training data

Original distribution in the test data (65,000 instances)

Sampling

Data partition (number of records)

Oversampling Undersampling

Data subset Class 1 Class -1 Class 1 Class -1

Training set 20,000 20,000 2,072 2,072

Validation set 10,000 10,000 1,035 1,035

SUM 30,000 30,000 3,107 3,107

Test (hold-out) set 912 64,088 912 64,088

Created 2 Dataset Sampling candidates

Page 136: BAQMaR 2008

Oversampling outperforms undersampling consistently!

Gain in Lift depends on method (different sensitivity)

Oversampling has higher impact than data coding & scaling

Results

Increase

Increase

Increase

Page 137: BAQMaR 2008

Binning & Scaling of continuous attributes irrelevant for all methods!

Use Undersampling & N-1 encoding with SVM & NN

Best preprocessed SVM lift of 0.645 on test set … BUT …

Recommendations from Case Study

• Sampling • Oversampling outperfoms undersampling for all methods

• Undersampling: better in-sample results & worse out of sample

• Choice of method • NN & SVM better than CART

• Encoding & Projection• SVM: avoid Ordinal coding (e.g. 1,2,3) all other similar (incl. N !)

• NN: avoid standardization & ordinal encoding

• DT / CART: use temperature, all others similar (incl. ordinal)

Page 138: BAQMaR 2008

Results are consistent across error measures

Experiments allow identification of „best practices‟ to model methods

Best-practice preprocessing varies between methods

Results across Pre-processing

Preprocessing: higher impact than method selection Lift-variation per method from Sampling/Scaling/Coding

> Difference of Lift between competing methods!

DTSVMNN

Method

0,65

0,64

0,63

0,62

0,61

0,60

Lif

t te

st

Lift performance onTest data subset

DTSVMNN

Method

0,58

0,57

0,56

0,55

0,54

0,53

AM

te

st

Arithmetic Mean Performanceon Test data subset

DTSVMNN

Method

0,58

0,57

0,56

0,55

0,54

0,53

0,52

0,51

0,50

GM

te

st

Geometric Mean Performanceon Test data subset

DPP causes 50%-70% of the

differences between models

Page 139: BAQMaR 2008

• Sampling issues in Data Mining

• Case study 1: Direct Marketing• Cross-selling of Magazine subscriptions

• Effect of data preprocessing: Sampling

• Interaction of Sampling with Scaling & Coding

• Case study 2: Credit & Behavioral Scoring• Predicting consumer credit default

• Effects of sample size

• Effects of sample distribution

• Case study 3: Online Shopping Behaviour• Predicting consumer shopping channel choice

• Sample distribution & multiple classes

• Conclusion & Take-aways

Agenda

Page 140: BAQMaR 2008

Business Case: Predicting

Customer Online Shopping Adoption

• Traditional buying process is offline & simultaneous “bricks” store

• Introduction of the Internet changes consumer behaviour• Seek information online & offline

• Purchasing online & offline

Changing purchasing behaviour through internet adoption

Changing purchasing behaviour through Technology Acceptance

• Development of heterogeneous Purchasing Behaviour• Example: Purchasing electronic durable consumer goods

• Search for product info (e.g. video cameras) online

test product in-store

search for best deal on internet & purchase

Search for Information Online Purchase Online

Search for Information Offline Purchase Offline

Online

Shoppers

Non-Internet

Shoppers

Browsers

Page 141: BAQMaR 2008

Stages of Internet Adoption

1. OFFLINE BUYERS

Information gathering

& purchasing in Stores

2.BROWSERS

Information gathering online

& purchasing in stores

3.ONLINE BUYERS

Information gathering

& purchasing online

Page 142: BAQMaR 2008

Motivation

DIDIER: Marketing Modelling

• Econometric / Marketing Domain

• Seeks to explain how customers behave in

online shopping

• Use of „black-box” logistic regression

models

Models class membership to identify

causal variables that explain choices

Descriptive & Normative Modelling

SVEN: Data Mining Perspective

• IS/OR/MS Domain Data Mining

• Seeks to accurately predict regardless of

explanation why customers buy

• Use of “black-box” methods from

computational intelligence

Models class membership to

accurately classify unseen instances

Predictive Modelling

same dataset & same objectives & similar methods

Conflicting “best practice” approaches to modelling

Outside of most software simulators!!! Implicit knowledge?

… WHO IS “CORRECT”? WHAT IS THE IMPACT?

Best practices

balance datasets for distribution

representative of population

Use ordinal variables & nominal variables

without recoding

Do not normalise / scale data

Best practices

Rebalance datasets for equal distribution

of target variables

Recode ordinal binary scale

Rescale & normalise data to facilitate

learning speed etc.

Page 143: BAQMaR 2008

Dataset

• Survey on Internet Shopping Behaviour• 5500 UK households 685 respondents

• Adjusted for age, income etc. of customers (older less likely to buy)

• Adjusted for product specific risk of online shopping for branded durable consumer goods (inspection required to some extent)

• 73 questions on factors related to internet shopping, products etc.

Models Output VariablesInput Variables

Demographics

Internet

specific

Factors

Online

shopping

specific

Factors

Logistic Regression

Neural Networks

Class 1:

Browse Ônline &

Buy Online

Class 2:

Browse Online &

Buy Offline

Class 3:

Browse &

Buy Offline

Online Shopping Factors:

“Going to the shops is as convenient

as Internet shopping”

“I would buy online if products are

branded” etc. [1=strongly agree; …]

Demographic Factors

Age, Gender, Income

Internet Utility Factors

Score from 6 correlated variables

Mixed scale of nominal, ordinal, interval

Page 144: BAQMaR 2008

Imbalanced Classification problem

UndersamplingOversamplingImbalanced

Dataset

Offline-Shoppers

BrowsersOnline-Shoppers

Offline-Shoppers

BrowsersOnline-Shoppers

Offline-Shoppers

BrowsersOnline-Shoppers

400

300

200

100

0

Co

un

t

Test

Validation

Training

Data Subset

UndersamplingOversamplingImbalanced

Dataset

Offline-Shoppers

BrowsersOnline-Shoppers

Offline-Shoppers

BrowsersOnline-Shoppers

Offline-Shoppers

BrowsersOnline-Shoppers

400

300

200

100

0

Co

un

t

Test

Validation

Training

Data Subset

• Split of Dataset for Training, Validation and Test {50%;25;25%}• Distribution of target classes is skewed

{65% online buyers; 22.5% browsers; 12.5% offline shoppers}• Rebalancing of data sets through over- & undersampling)

Page 145: BAQMaR 2008

Results without DiscretisationLogist.Reg. True

Value

Training Data Test Data

Dataset Online Browse Offline Online Browse Offline

Original Online 93.36 5.17 1.48 88.89 7.78 3.33

Imbalanced Browser 62.77 23.40 13.83 49.39 22.58 29.03

Offline 36.54 17.31 46.15 35.29 29.41 35.29

Under- Online 57.69 30.77 11.54 64.44 23.33 12.22

Sampling Browser 26.92 48.08 25.00 32.26 25.81 41.94

Offline 17.31 21.15 61.54 29.41 35.29 35.29

Over- Online 68.27 24.35 7.38 74.44 16.67 8.89

Sampling Browser 30.63 43.91 25.46 35.48 29.03 35.48

Offline 16.97 19.93 63.10 29.41 29.41 41.18

Neural Net Training Data Test Data

Dataset Online Browse Offline Online Browse Offline

Original Online 86.19 12.71 1.10 86.67 8.89 4.44

Imbalanced Browser 53.13 31.25 15.63 41.94 35.48 22.58

Offline 25.17 28.57 45.71 29.41 35.29 35.29

Under- Online 44.86 40.00 17.14 27.78 58.89 13.33

Sampling Browser 14.29 48.57 37.14 16.13 32.26 51.61

Offline 8.57 20.00 71.43 11.76 41.18 47.06

Over- Online 81.22 18.23 0.55 61.11 22.22 16.67

Sampling Browser 14.92 83.43 1.66 19.35 77.42 3.23

Offline 15.52 0.55 99.45 0.00 11.76 88.24

MCRtrain=54.3%

MCRtest =48.9%

MCRtrain=55.8%

MCRtest =41.8%

MCRtrain=58.4%

MCRtest =48.2%

MCRtrain=54.4%

MCRtest =52.5%

MCRtrain=54.9%

MCRtest =35.7%

MCRtrain=88.0%

MCRtest =75.6%

Mean Classification Rate (%)

Page 146: BAQMaR 2008

Results with Discretisation of OrdinalLogist.Reg. True

Value

Training Data Test Data

Dataset Online Browse Offline Online Browse Offline

Original Online 91.51 6.64 1.85 85.56 7.78 6.67

Imbalanced Browser 54.26 36.17 9.57 48.39 32.26 19.35

Offline 26.92 17.31 55.77 58.82 47.62 17.65

Under- Online 71.15 21.15 7.69 55.56 24.44 20.00

Sampling Browser 17.31 65.38 17.31 67.74 6.45 25.81

Offline 15.38 11.54 73.08 58.82 0.00 41.18

Over- Online 68.63 22.88 8.49 70.0 21.11 8.89

Sampling Browser 17.34 56.83 25.83 12.90 58.06 29.03

Offline 13.28 14.02 72.69 17.65 23.53 58.82

Neural Net Training Data Test Data

Dataset Online Browse Offline Online Browse Offline

Original Online 96.13 3.87 0.00 84.44 11.11 4.44

Imbalanced Browser 68.75 28.13 3.13 64.52 22.58 12.90

Offline 40.00 14.29 45.17 58.82 11.76 29.41

Under- Online 57.14 40.00 2.86 25.56 72.22 2.22

Sampling Browser 34.29 54.29 11.43 67.74 29.03 3.23

Offline 14.29 31.43 54.29 52.94 17.65 29.41

Over- Online 98.34 1.10 0.55 58.89 24.44 16.67

Sampling Browser 0.00 100.0 0.00 3.23 83.87 12.90

Offline 0.00 0.00 100.0 0.00 5.88 94.12

MCRtrain=61.15%

MCRtest =45.1%

MCRtrain=69.9%

MCRtest =34.4%

MCRtrain=66.0%

MCRtest =62.3%

MCRtrain=56.5%

MCRtest =45.5%

MCRtrain=55.2%

MCRtest =28.0%

MCRtrain=99.5%

MCRtest =79.0%

Mean Classification Rate (%)

Page 147: BAQMaR 2008

Oversampling outperforms other samplings- Across Different Datasets

- Across various data preprocessing

Methods show different sensitivity to Sampling- More variation from sampling, coding & scaling than between methods

- Using different preprocessing variants is important in modeling

Various sophisticated extensions exist- SMOTE (Synthetic Minority Oversampling Technique)

- K-nearest Neighbor sampling (removal / creation)

- One-class learning etc. …

Extend your bad of tricks …- … and experiment with imbalanced sampling!

Summary

Page 148: BAQMaR 2008

Sven F. CroneLancaster University Management School

Centre for Forecasting

Lancaster, LA1 4YX

email [email protected]

Questions?

1 1(1 ) t t tSY Y SY

Page 149: BAQMaR 2008

Exploring Innovation

“Online panel” vs “Online „streaming/convenience”

sampling

Page 150: BAQMaR 2008

Unfortunately, the presenters of iVOX &

Corelio cannot share their presentation

with the BAQMaR community due to

reasons of confidentiality!

Page 151: BAQMaR 2008
Page 152: BAQMaR 2008
Page 154: BAQMaR 2008
Page 155: BAQMaR 2008
Page 156: BAQMaR 2008
Page 158: BAQMaR 2008