data innovation summit - made in belgium 2015

21
Data Innova)on Summit 2015 Andrea Dal Pozzolo, Olivier Caelen, and Gianluca Bontempi Fraud Detec)on and ConceptDriB Adapta)on with Delayed Supervised Informa)on 26/3/2015

Upload: andrea-dal-pozzolo

Post on 15-Jul-2015

67 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Data  Innova)on  Summit  2015  

Andrea  Dal  Pozzolo,  Olivier  Caelen,  and  Gianluca  Bontempi    

 

Fraud  Detec)on  and  Concept-­‐DriB  Adapta)on  with  Delayed  Supervised  Informa)on  

26/3/2015  

About  Me  

•  I’m  a  PhD  student    •  Machine  Learning  techniques  for  Fraud  Detec)on  in  electronic  transac)on.    

Academic  partner:  MLG  -­‐  ULB  

•   Researchers  expert  on  data  mining,  computa)onal  modelling,  sta)s)cs  and  their  applica)ons  to  fraud  detec.on,  bioinforma)cs  and  )me  series  predic)on.  

Industrial  partner:  Worldline    

•  Worldline  is  leader  in  electronic  payment  services.  

•  In  Brussels  has  a  team  of  experts  with  more  than  25  years  of  exper)se  in  fraud  detec)on  

The  problem  •  Growing  presence  of  frauds  •  It  is  not  easy  for  a  human  analyst  to  detect  fraudulent  

paXerns  -­‐-­‐>  need  automa)c  systems  for  fraud  detec)on  

Challenges  

1.  Concept  driB  (i.e.  customers’  spending  habits  change)    

Challenges  

2.  Unbalanced  classifica)on  (i.e.  few  frauds)      

dataset$X1

dataset$X2

Challenges  

Predic)ve  model  3.  True  class  label  of  only  few  alerted  and  checked  transac)ons.  

Goal  of  Detec)on  With  a  limited  budget,  few  transac)ons  can  be  manually  checked.                Goal:  limi)ng  the  false  alerts  for  a  given  budget  

Two  types  of  transac)ons    

Time%

Feedbacks%

Supervised%samples%

Delayed%samples%

t −δ t −1 t

FtDt−δ

All%fraudulent%transac9ons%of%a%day%

All%genuine%transac9ons%of%a%day%Fraudulent%transac9ons%in%the%feedback%

Genuine%transac9ons%in%the%feedback%

Data  streams  

•  Feedbacks:    – Classifier  dependent  – Small  set  of  risky  transac)ons  

Time%

Fraudulent%transac9ons%in%

Genuine%transac9ons%in%Fraudulent%feedback%in%%

Genuine%feedback%in%%

FtFt−1Dt−7 Ft−6 Ft−5Ft−4Ft−3 Ft−2

FtFt−1Dt−7 Ft−6 Ft−5Ft−4Ft−3 Ft−2Dt−8

FtFt−1Dt−7 Ft−6 Ft−5Ft−4Ft−3 Ft−2Dt−8

Day'1'

Day'2'

Day'3'

FtFt

StSt

Dt−9

•  Delayed  samples  – Large  set  of  mostly  genuine  transac)ons  

Learning  strategy  Time%

FtFt−1Dt−7 Ft−6 Ft−5Ft−4Ft−3 Ft−2Dt−8

FtWDt

Time%

FtFt−1Dt−7 Ft−6 Ft−5Ft−4Ft−3 Ft−2Dt−8

Wt

AW

Proposed'solu3on'

Standard'solu3on'

Concept  driB  adapta)on  

•  Learn  from  new  concepts  and  forget  outdate  transac)ons  using  a  sliding  window.  

Time%

FtFt−1Dt−7 Ft−6 Ft−5Ft−4Ft−3 Ft−2Dt−8

FtFt−1Dt−7 Ft−6 Ft−5Ft−4Ft−3 Ft−2Dt−8

Day'1'

Day'2'

Dt−9

FtWDt

FtWDt

Unbalance  Delayed  samples  

•  Remove  randomly  genuine  transac)ons  to  balance  the  delayed  sample  before  training  

Unbalanced)dataset) Balanced)dataset)

Undersampling)

Experiments  on  a  real  dataset    

WAW

Experiments  on  a  synthe)c  dataset  with  Concept  DriB    

WAW

Benefit  of  proposed  solu)on  

•  Exploit  the  feedbacks  from  inves)gators    •  Meets  realis)c  working  condi)ons  

•  Gives  large  influence  to  feedbacks  w.r.t.  delayed  samples.  

Conclusion  

•  Alert-­‐feedback  interac)on  has  to  be  considered  in  designing  fraud  detec)ons  systems  

•  Feedbacks  from  inves)gators  have  to  be  separately  handled.  

•  Aggrega)ng  two  dis)nct  classifiers  is  an  effec)ve  solu)on  for  concept  driBs.  

Future  work  

1.  Adap)ve  classifier  aggrega)ons  

2.  Implementa)on  into  Big  Data  architectures  

3.  This  work  will  be  con)nued  with  the  BruFence  project  

BruFence  

•  Big  Data  Mining  for  Fraud  Detec)on  and  Security    •  2015-­‐2018  funded  by  Innoviris  (Brussels  Region).    

• ULB - QualSec

• ULB - MLG

• UCL - MLG

!

!

!

!

• Wordline

• Steria

• NViso

BruFence - consortium

Spice

• ULB - QualSec

• ULB - MLG

• UCL - MLG

!

!

!

!

• Wordline

• Steria

• NViso

BruFence - consortium

Spice

• ULB - QualSec

• ULB - MLG

• UCL - MLG

!

!

!

!

• Wordline

• Steria

• NViso

BruFence - consortium

Spice

• ULB - QualSec

• ULB - MLG

• UCL - MLG

!

!

!

!

• Wordline

• Steria

• NViso

BruFence - consortium

Spice

Thank  you  for  listening