innovative design methods for data science - beyond brainstorming

29
Innovative design methods for data science projects - beyond brainstorming Akın O. Kazakçı [email protected] Centre for Data Science January the 7th, 2014

Upload: akin-kazakci

Post on 16-Jul-2015

245 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Innovative design methods for data science projects

- beyond brainstormingAkın O. Kazakçı

[email protected]

Centre for Data Science  January  the  7th,  2014  

Plan

1. Introduction!

2. Potential contribution of design theory!

2!

Akın O. Kazakçı, MINES ParisTech!

Design  Theory  and  Methods  for  Innova4on  •  Chair  for  Research  and  Educa:on  •  Fundamental  Research  on  Design  Theory  •  11  Industrial  Sponsors  •  Theory  ,  Field  research,  History,              Laboratory  experiments  

CDS; Peculiar Characteristics & Lots of Unknown

•  What is data-science?–  You have 10 secs. Please avoid dictionary definitions. And

no, do not use a list of subdomains.

•  Is this a new form of organisation? Which model?–  Neither private R&D, nor traditional research lab.

•  How to unify and align researchers interests?–  Would traditional incentives be enough?

•  What is the overall project for CDS?–  How to build a joint long-term vision with clearly

articulated (scientific or not) objectives?

4!

Akın O. Kazakçı, MINES ParisTech!

Gartner’s Hype Cycle5!

Akın O. Kazakçı, MINES ParisTech!

Cabane et al. 2014, Understanding the Role of Collective Imaginary in the Dynamics of ExpectationsInt. Prod. Dev. Mana. (IPDM) Conf.

Are there strategies that would allow « smooth landing »?6!

Akın O. Kazakçı, MINES ParisTech!

Average DSI Curve

« Smooth-Lander » DSI

Innovative DSI

How  to  reach  plateau  of  produc:vity?  

How  to  reach  it  before  others  and  lead  the  way?  

Which  methods,  processes  or  principles  would  allow  building  innova:on  strategies  for  DSIs?  

How  would  a  data  science  ini:a:ve  (e.g.  centres  or  groups)  generate  high-­‐poten:al  projects  that  can  lead  to  breakthrough  results?  

Plan

1. Introduction!

2. Potential contribution of design theory!

7!

Akın O. Kazakçı, MINES ParisTech!

Profound Transformation of NPD activities 8!

Akın O. Kazakçı, MINES ParisTech!

•  New functional spaces •  New user experiences •  New competencies

•  New partnerships

•  New business models

•  Fuzzy industrial sectors

è 3rd Industrial revolution (Le Masson et al., 2006) è New Products vs. New Product Types è Revision of Objects’ Identities (Hatchuel et al., 1999)

New products vs. New product categories 

? ? ?

? ?

A300 A340

Main functions and design parameters are maintained

Rule-­‐based  design  

Rule-­‐breaking  design  

•  New functional spaces •  New competencies •  New partnerships •  New business models

Innova4on:  op4misa4on  or  iden4ty  change?  

Innova:on  as  «  op:misa:on  »  

Innova:on  as  «  iden:ty  change  »  

11!

Akın O. Kazakçı, MINES ParisTech!

How to capture revision of identities?–  A  concept-­‐knowledge  theory  of  design  

«  Design  specs  »  Tradi:onal  Object  Defini:ons:   Knowledge  

Methods,  Judgements,  R&D  Competencies…  

an  example  of  design  specs  for  locomo:ve  engines  (1890s’)  

In  design,  objects  can  be  defined  by  a  «  design  spec  »  -­‐  a  list  of  features  (or  proper:es).    The  designer  (individual  or  group)  need  to  have  some  knowledge  specific  to  each  «  feature  »  to  be  able  to  implement  (or  build)  it  and  for  handling  interac:ons.  

Revision of identities as « Dual expansive reasoning »

?  

?  

Concept  expansions   Knowledge  expansions  

In  «  innova:ve  design  »,  both  design  specs  and  associated  knowledges  are  «  dissolved  »  and  «  made  to  evolve  ».  

Source:  Wikipedia  Hatchuel  96;  Hatchuel  and  Weil  99,  02  Kazakci  and  Tsoukias,  03;  Kazakci  07  

13!

C-K design theory: a breakthrough in understanding design

C-­‐K  design  theory  describes  innova:ve  design  as  the  interac:on  and  joint  expansion  of  concepts  and  knowledge.  

Ø  Collec:ve  reasoning  and  ac:on  on  desired,  unknown  and  undecidable  objects  

Ø  Two  spaces  for  exploring:  Space  of  concepts  (arborescent  explora:on  of  unfeasible  specifica:ons)  and  knowledge  space  (proposi4ons  about  the  world  –  all  kinds  of  knowledge).    

Ø  Opera4ons  for  iden4ty  change  :  Expansive  par44ons    (flying  ship,  free  newspaper,  mobile  museum,  camera-­‐glass,  …  )  

A  revival  of  design  theory  field:  Yoshikawa,  81;  Suh,  91;  Braha  and  Reich  03;  Shai  and  Reich,  03;  Research  in  Engineering  Design,  Special  Issue  on  Design  Theory  (2013),  …  

Plan

1. Introduction!

2. Potential contribution of design theory!

Methods:!

–  Innovation Field Mapping!

–  KCP Process!

14!

Akın O. Kazakçı, MINES ParisTech!

15!

Akın O. Kazakçı, MINES ParisTech!

Brainstorming  is  not  enough  !!!  

Concept    

 Knowledge  

Classic  K  New  K  for  motorist  

16!

Akın O. Kazakçı, MINES ParisTech!

C-K for Innovation Field Mapping

What  is  the  Open  Rotor  innova4on  field  ?    

Project  with  Snecma  Brogard,  Joanny,  2010  Chaire  TMCI  

Exploring the classic engines improvements

Changing plane and flying experience

-

How  to  go  beyond  tradi4onal  design  paths?    

17!

Akın O. Kazakçı, MINES ParisTech!

C-K for Innovation Field Mapping

monitoring  progress  with  CrossValida:on  

+  

Achieve  5σ!

Select  a  classifica:on  method !

Pre-­‐processing!

Choose  hyper-­‐params !

Train !

Op:mize  for  accuracy!

SVM   Decision  Trees  

NN  …..…..  

Integrate  AMS  directly  in  training  

during  Gradient  Boos:ng  (John)  

during  node  split  in  random  forest    (John)  

Weighted  Classifica:on  Cascades  

Two  par:cipants  observe  that  AMS  can  be    refactorized  and  its  terms  can  be  rewrimen  in  terms  of  their  convex  conjugate  form  –  which  allow  to  Fenchel-­‐Young  inequality  from  convex  op:miza:on  limerature.    Ref:  hmp://arxiv.org/pdf/1409.2655v2.pdf,  Mackey  &  Brian  Op:miza:on  of  AMS  becomes  possible  by  a  procedure  they  name  Weigthed  Classifica/on  Cascades.(Rank:  461th)  ?  ?  ?  ?  ?    

Gradient  boos:ng  methods  fit  a  classifier  to  the  'per  data  point  loss'  and  since  AMS  is  not  a  sum  of  per  data  point  (event)  losses,  it's  not  obvious  how  to  do  use  AMS  as  a  loss  in  gradient  boos:ng  (Andre  Holzner)  

AMS:  3.3  è  The  node  split  works  by  looking  for  the  split  that  maximises  the  AMS  of  one  side  of  the  split  when  predic:ng  it  as  pure  signal  (John)  

An  alterna:ve  may  be  to  «  use  AUC  in  gradient  boos:ng  :ll  you  get  to  the  max  cv  result  and  then  tried  to  move  forward  with  an  AMS  loss  func:on  from  that  point  »    In  principle,  the  AMS  approximate  func4on  is  derivable  (hmp://:nyurl.com/ov5pedq)  at  a  node  level  (s  and  b  being  the  totals  of  other  nodes,  considered  constant,  and  x,  w  being  the  probability  predic:on  and  weight  for  the  node  to  be  split)  and  one  could  rewrite  the  part  of  code  where  the  objec:ve  func:on  is  evaluated,  replacing  the  sums  with  a  different  calcula:on  »  (Giulio  Casa)  

C  space   K  Space  

Design  for  sta:s:cal  efficiency  

1st  2nd  

3rd  

ensembles  +  

selec:ng  a  cutoff  threshold  that  op:mise  (or  stabilise  AMS)  

Design  strategy  analysis  for  HiggsML  challenge  teams  

Reduce    within-­‐class  imbalance  

C   K  Dealing  with  CIP  

By  adjus4ng  class  distribu4on  

Working  in  input  space  

Re-­‐represen4ng  inputs  

Local    distor4on  

Produce  an  embedding  

Change  spa4al  resolu4on  

For  some  X  

X  is  a  support  vector  

With  raw  data  

Feature  engineering  

Exploratory  (knowledge  or  intui4on  based  Automated  

Gene4c  Algoritms  (Wasilowski,  Chen,  2009)  

Reduce    between-­‐class  imbalance  

Reduce    both  

Costs  are  known  

Oversampling  signals  

Undersampling  the  background  

Iden4fying  class  distribu4on  

Progressive  sampling  

by  duplica4ng  by  synthesizing  new  

points  

SMOTE,  (Chawla,  Bowyer  et  al.  2002)  

MSMOTE  (Hu  et  al,  2009  )  

Borderline  SMOTE  (Han  et  al,  2005)  )  

Adap4ve  Synthe4c  Sampling  

 (He  et  al,  2008  )  

SafeLevel  Sampling  (Bunkhumpornpat  et  

al  2008  )  

resample  

each  mixture  contains  all  signals  +  some  background  

Such  that  all  background  points  are  used  at  least  in  

one  mixture  

Use  meta-­‐learning  (Chan,  Stolfo,  2001)  

Use  SVM  ensemble  (Yan,  Lin  et  al,  2003)  

Remove  reduntant  (Kubat,  Matwia,  1997  

Remove  border  regions  with  background  

examples  (Kubat,  Matwia,  1997)  

Reduce  overlap  

Preferen4al  sampling  

Remove  background  whose  average  distance  to  its  3  NN  

is  smallest  (Mani,  Zhang,  2003)  

By  adap4ng  algorithms  

Improve  predic4ve  accuracy   Reduce  predic4ve  

variance  

Alterna4ve  search  

techniques  

Non-­‐greedy  methods  

Gene4c  Alg.  

Detect  rare  events  TimeWeaver  

(  )  

Discover  small  disjuncts  

(Carvahlo,  Freitas,  )  

Change  evalau4on  metrics  

Simulated  Annealing  

Depth-­‐bound  exhaus4ve  Brute  ()  

Laplace  es4mate  

Evaluate  small  disjuncts  separately  Quinlan,  ()  

Modify  defini4on  of  learning  

Bias  induc4on  towards  specificity  

Minimize  error  costs  

Change  levels  of  learning  

Cascade  of  learners  

Learn  only  rare  class  ()  

Two-­‐level  learnig  ()  

Unknown  Costs  

Modify  base  learner  

Max  Specificity  (Acker,  Porter,  1989)  

Specificity  for  small  disjuncts  

(Ting,  1989)  

Base  is  a  Tree  Learner  

Split  aoributes  are  selected  to  minimise  total  expected  cost  

Base  is  a  NN  

Cost-­‐weighted  error  

propaga4on  

Relabeling  for  min  expected  cost  

Test  data   Training  data  Weigh4ng  (Ting,  1998)  

CSC  (Wioen,  Franck,  2005)    

MetaCost  (Domingos,  1999)  

Cos4ng  (Zadrony  et  al,  2003)  

Preprocessing    

Cost-­‐based  sampling  

Empirical  Threshold  Sepng  

Plot  total  cost  for  various  

thresholds  

Choose  min  using  

plot  

With  Cross  Valida4on  

by  choosing  less  steep  hills  Thresholding  (Sheng,  Ling,  2006)  

Using  ensembles  

Using  cross  

valida4on  

Cost-­‐Sensi4ve  Boos4ng  

Imbalanced  IVotes  ()  

AdaCost  (  )  

Using  sampling  to  alter  weight  distribu4on  

Boos4ng  

CSB  ()  

RareBoost  (  )  

MSMOTE  Boost  ()  

SMOTE  Boost  ()  

Data  Boost-­‐IM  ()    

RUSBoost  ()  

Bagging  

Overbagging  (  )  

Underbagging  ()  

Under-­‐Over-­‐

Bagging  ()  

Dicovery  condi4on:  A  discovery  is  claimed  when  we  …  

Problem  formula4on:  Tradi:onal  classifica:on  seung…  

Cross-­‐Valida4on:  Techniques  for  evalua:ng  how  a  …  

Ensemble  Methods  

Gradient  boos:ng  methods  fit  a  classifier  to  the  'per  data  point  loss'  and  since  AMS  is  not  a  sum  of  per  data  point  (event)  losses,  it's  not  obvious  how  to  do  use  AMS  as  a  loss  in  gradient  boos:ng  (Andre  Holzner)  

AMS:  3.3  è  The  node  split  works  by  looking  for  the  split  that  maximises  the  AMS  of  one  side  of  the  split  when  predic:ng  it  as  pure  signal  (John)  

An  alterna:ve  may  be  to  «  use  AUC  in  gradient  boos:ng  :ll  you  get  to  the  max  cv  result  and  then  tried  to  move  forward  with  an  AMS  loss  func:on  from  that  point  »    In  principle,  the  AMS  approximate  func4on  is  derivable  (hmp://:nyurl.com/ov5pedq)  at  a  node  level  (s  and  b  being  the  totals  of  other  nodes,  considered  constant,  and  x,  w  being  the  probability  predic:on  and  weight  for  the  node  to  be  split)  and  one  could  rewrite  the  part  of  code  where  the  objec:ve  func:on  is  evaluated,  replacing  the  sums  with  a  different  calcula:on  »  (Giulio  Casa)  

1  

2  

3  4   5  

Data  science  as  a  new  fron:er  for  design    A.  Kazakci,  ICED’15  (submimed)  

DKCP process: Linearising C-K dynamics20!

Akın O. Kazakçı, MINES ParisTech!

Proven  methodology:  -­‐        Developped  at  Mines  ParisTech  (TMCI)  with  RATP  and  Thalès  Avionics  -­‐  40+  KCP  by  researchers  (2002-­‐2014)  -­‐  2  PhD  Projects  (Arnoux,  2013;  Klasing  Chen,  in  process)  -­‐  Now,  a  network  of  specialist  consultants  

Ini4alisa4on  [K]  Knowledge  

sharing  Workshops  

[P]  Project  building  

[C]  IFM-­‐Design  Workshops  

[RUN]  

Try  it!  -­‐  Red  Bull  Gravity  Challenge  

You  are  a  designer  and  you  have  been  asked  to  produce  the  most  crea:ve  solu:on  to  the  following  ques:on:      Ensure that a hen's egg dropped from a height of 10m does not break.”

Agogué

©.    

Being  innova:ve:  how  easy  is  that?  

Your  turn!  

Experiments  with  210  subjets  (842  proposi/ons)  

“Fixa4on  effects”    Three  types  of  solu:ons  :  

Slowing  the  fall  Protec:ng  the  egg  Dumping  the  schock  covers  81  %  results!  

Fixa:ons  on  an  objects  iden:ty  

You  got  anything  beKer  ???  

Determining expansive path using C-K reasoning Determining fixation path using C-K reasoning

Theory-driven experiments – SIG Design Theory 2012 – M.Cassotti & M.Agogué

C space K space

Expanding both in the C-space and in the K-space for the “egg” task

Result 1 : the paths identified as fixation paths using C-K theory are the ones within the fixation effect for adults

Theory-driven experiments – SIG Design Theory 2012 – M.Cassotti & M.Agogué

(1) Natural distribution of solutions of a design task

Types of « fixation » based on C-K theory25!

Akın O. Kazakçı, MINES ParisTech!

Cogni:ve  fixa:ons  

Social  fixa:ons  

Limits of traditional methods for collective creativity

Consensus& Shared understanding

Originality

Participative Seminars

Creative Commandos

è Classical methods do not allow generating concepts that are both breakthrough and shared!

Fixa:on  Phenomena  

Isola:

on  Phe

nomen

a  

26!

Akın O. Kazakçı, MINES ParisTech!

DKCP : Organising for shared breakthrough projects

Consensus& Shared understanding

Originality

Fixa:on  Phenomena  

Isola:

on  Phe

nomen

a  A  method  for  steering  breakthrough  process  

27!

Akın O. Kazakçı, MINES ParisTech!

DKCP process: Linearising C-K dynamics28!

Management  of  the  cogni4ve  and  social  aspects  (KCP  facilitators)  

Innova4on  effort  (Par:cipants;  20-­‐50)  

D  

K   C  

P  Pré-­‐C  

Pré-­‐K  

Project  organisa:on  

Defining  and  pre-­‐explora:on  of  K  pockets  

Sharing  and  integra:ng  K  

Orienta:on  of  phase  C  

Guided  crea:vity  

Building  ac:onnable  strategies  

Akın O. Kazakçı, MINES ParisTech!

Ini4alisa4on  [K]  Knowledge  

sharing  Workshops  

[P]  Project  building  

[C]  IFM-­‐Design  Workshops  

[RUN]  

Thank you!

Disclaimer: Copyrights of images belong to their respective owners.

29!

Akın O. Kazakçı, MINES ParisTech!

Akın O. Kazakçı [email protected]

Feel  free  to  contact  me  for  more: