innovative design methods for data science - beyond brainstorming
TRANSCRIPT
Innovative design methods for data science projects
- beyond brainstormingAkın O. Kazakçı
Centre for Data Science January the 7th, 2014
Plan
1. Introduction!
2. Potential contribution of design theory!
2!
Akın O. Kazakçı, MINES ParisTech!
Design Theory and Methods for Innova4on • Chair for Research and Educa:on • Fundamental Research on Design Theory • 11 Industrial Sponsors • Theory , Field research, History, Laboratory experiments
CDS; Peculiar Characteristics & Lots of Unknown
• What is data-science?– You have 10 secs. Please avoid dictionary definitions. And
no, do not use a list of subdomains.
• Is this a new form of organisation? Which model?– Neither private R&D, nor traditional research lab.
• How to unify and align researchers interests?– Would traditional incentives be enough?
• What is the overall project for CDS?– How to build a joint long-term vision with clearly
articulated (scientific or not) objectives?
4!
Akın O. Kazakçı, MINES ParisTech!
Gartner’s Hype Cycle5!
Akın O. Kazakçı, MINES ParisTech!
Cabane et al. 2014, Understanding the Role of Collective Imaginary in the Dynamics of ExpectationsInt. Prod. Dev. Mana. (IPDM) Conf.
Are there strategies that would allow « smooth landing »?6!
Akın O. Kazakçı, MINES ParisTech!
Average DSI Curve
« Smooth-Lander » DSI
Innovative DSI
How to reach plateau of produc:vity?
How to reach it before others and lead the way?
Which methods, processes or principles would allow building innova:on strategies for DSIs?
How would a data science ini:a:ve (e.g. centres or groups) generate high-‐poten:al projects that can lead to breakthrough results?
Plan
1. Introduction!
2. Potential contribution of design theory!
7!
Akın O. Kazakçı, MINES ParisTech!
Profound Transformation of NPD activities 8!
Akın O. Kazakçı, MINES ParisTech!
• New functional spaces • New user experiences • New competencies
• New partnerships
• New business models
• Fuzzy industrial sectors
è 3rd Industrial revolution (Le Masson et al., 2006) è New Products vs. New Product Types è Revision of Objects’ Identities (Hatchuel et al., 1999)
Main functions and design parameters are maintained
Rule-‐based design
Rule-‐breaking design
• New functional spaces • New competencies • New partnerships • New business models
Innova4on: op4misa4on or iden4ty change?
Innova:on as « op:misa:on »
Innova:on as « iden:ty change »
11!
Akın O. Kazakçı, MINES ParisTech!
How to capture revision of identities?– A concept-‐knowledge theory of design
« Design specs » Tradi:onal Object Defini:ons: Knowledge
Methods, Judgements, R&D Competencies…
an example of design specs for locomo:ve engines (1890s’)
In design, objects can be defined by a « design spec » -‐ a list of features (or proper:es). The designer (individual or group) need to have some knowledge specific to each « feature » to be able to implement (or build) it and for handling interac:ons.
Revision of identities as « Dual expansive reasoning »
?
?
Concept expansions Knowledge expansions
In « innova:ve design », both design specs and associated knowledges are « dissolved » and « made to evolve ».
Source: Wikipedia Hatchuel 96; Hatchuel and Weil 99, 02 Kazakci and Tsoukias, 03; Kazakci 07
13!
C-K design theory: a breakthrough in understanding design
C-‐K design theory describes innova:ve design as the interac:on and joint expansion of concepts and knowledge.
Ø Collec:ve reasoning and ac:on on desired, unknown and undecidable objects
Ø Two spaces for exploring: Space of concepts (arborescent explora:on of unfeasible specifica:ons) and knowledge space (proposi4ons about the world – all kinds of knowledge).
Ø Opera4ons for iden4ty change : Expansive par44ons (flying ship, free newspaper, mobile museum, camera-‐glass, … )
A revival of design theory field: Yoshikawa, 81; Suh, 91; Braha and Reich 03; Shai and Reich, 03; Research in Engineering Design, Special Issue on Design Theory (2013), …
Plan
1. Introduction!
2. Potential contribution of design theory!
Methods:!
– Innovation Field Mapping!
– KCP Process!
14!
Akın O. Kazakçı, MINES ParisTech!
Concept
Knowledge
Classic K New K for motorist
16!
Akın O. Kazakçı, MINES ParisTech!
C-K for Innovation Field Mapping
What is the Open Rotor innova4on field ?
Project with Snecma Brogard, Joanny, 2010 Chaire TMCI
Exploring the classic engines improvements
Changing plane and flying experience
-
How to go beyond tradi4onal design paths?
17!
Akın O. Kazakçı, MINES ParisTech!
C-K for Innovation Field Mapping
monitoring progress with CrossValida:on
+
Achieve 5σ!
Select a classifica:on method !
Pre-‐processing!
Choose hyper-‐params !
Train !
Op:mize for accuracy!
SVM Decision Trees
NN …..…..
Integrate AMS directly in training
during Gradient Boos:ng (John)
during node split in random forest (John)
Weighted Classifica:on Cascades
Two par:cipants observe that AMS can be refactorized and its terms can be rewrimen in terms of their convex conjugate form – which allow to Fenchel-‐Young inequality from convex op:miza:on limerature. Ref: hmp://arxiv.org/pdf/1409.2655v2.pdf, Mackey & Brian Op:miza:on of AMS becomes possible by a procedure they name Weigthed Classifica/on Cascades.(Rank: 461th) ? ? ? ? ?
Gradient boos:ng methods fit a classifier to the 'per data point loss' and since AMS is not a sum of per data point (event) losses, it's not obvious how to do use AMS as a loss in gradient boos:ng (Andre Holzner)
AMS: 3.3 è The node split works by looking for the split that maximises the AMS of one side of the split when predic:ng it as pure signal (John)
An alterna:ve may be to « use AUC in gradient boos:ng :ll you get to the max cv result and then tried to move forward with an AMS loss func:on from that point » In principle, the AMS approximate func4on is derivable (hmp://:nyurl.com/ov5pedq) at a node level (s and b being the totals of other nodes, considered constant, and x, w being the probability predic:on and weight for the node to be split) and one could rewrite the part of code where the objec:ve func:on is evaluated, replacing the sums with a different calcula:on » (Giulio Casa)
C space K Space
Design for sta:s:cal efficiency
1st 2nd
3rd
ensembles +
selec:ng a cutoff threshold that op:mise (or stabilise AMS)
Design strategy analysis for HiggsML challenge teams
Reduce within-‐class imbalance
C K Dealing with CIP
By adjus4ng class distribu4on
Working in input space
Re-‐represen4ng inputs
Local distor4on
Produce an embedding
Change spa4al resolu4on
For some X
X is a support vector
With raw data
Feature engineering
Exploratory (knowledge or intui4on based Automated
Gene4c Algoritms (Wasilowski, Chen, 2009)
Reduce between-‐class imbalance
Reduce both
Costs are known
Oversampling signals
Undersampling the background
Iden4fying class distribu4on
Progressive sampling
by duplica4ng by synthesizing new
points
SMOTE, (Chawla, Bowyer et al. 2002)
MSMOTE (Hu et al, 2009 )
Borderline SMOTE (Han et al, 2005) )
Adap4ve Synthe4c Sampling
(He et al, 2008 )
SafeLevel Sampling (Bunkhumpornpat et
al 2008 )
resample
each mixture contains all signals + some background
Such that all background points are used at least in
one mixture
Use meta-‐learning (Chan, Stolfo, 2001)
Use SVM ensemble (Yan, Lin et al, 2003)
Remove reduntant (Kubat, Matwia, 1997
Remove border regions with background
examples (Kubat, Matwia, 1997)
Reduce overlap
Preferen4al sampling
Remove background whose average distance to its 3 NN
is smallest (Mani, Zhang, 2003)
By adap4ng algorithms
Improve predic4ve accuracy Reduce predic4ve
variance
Alterna4ve search
techniques
Non-‐greedy methods
Gene4c Alg.
Detect rare events TimeWeaver
( )
Discover small disjuncts
(Carvahlo, Freitas, )
Change evalau4on metrics
Simulated Annealing
Depth-‐bound exhaus4ve Brute ()
Laplace es4mate
Evaluate small disjuncts separately Quinlan, ()
Modify defini4on of learning
Bias induc4on towards specificity
Minimize error costs
Change levels of learning
Cascade of learners
Learn only rare class ()
Two-‐level learnig ()
Unknown Costs
Modify base learner
Max Specificity (Acker, Porter, 1989)
Specificity for small disjuncts
(Ting, 1989)
Base is a Tree Learner
Split aoributes are selected to minimise total expected cost
Base is a NN
Cost-‐weighted error
propaga4on
Relabeling for min expected cost
Test data Training data Weigh4ng (Ting, 1998)
CSC (Wioen, Franck, 2005)
MetaCost (Domingos, 1999)
Cos4ng (Zadrony et al, 2003)
Preprocessing
Cost-‐based sampling
Empirical Threshold Sepng
Plot total cost for various
thresholds
Choose min using
plot
With Cross Valida4on
by choosing less steep hills Thresholding (Sheng, Ling, 2006)
Using ensembles
Using cross
valida4on
Cost-‐Sensi4ve Boos4ng
Imbalanced IVotes ()
AdaCost ( )
Using sampling to alter weight distribu4on
Boos4ng
CSB ()
RareBoost ( )
MSMOTE Boost ()
SMOTE Boost ()
Data Boost-‐IM ()
RUSBoost ()
Bagging
Overbagging ( )
Underbagging ()
Under-‐Over-‐
Bagging ()
Dicovery condi4on: A discovery is claimed when we …
Problem formula4on: Tradi:onal classifica:on seung…
Cross-‐Valida4on: Techniques for evalua:ng how a …
Ensemble Methods
Gradient boos:ng methods fit a classifier to the 'per data point loss' and since AMS is not a sum of per data point (event) losses, it's not obvious how to do use AMS as a loss in gradient boos:ng (Andre Holzner)
AMS: 3.3 è The node split works by looking for the split that maximises the AMS of one side of the split when predic:ng it as pure signal (John)
An alterna:ve may be to « use AUC in gradient boos:ng :ll you get to the max cv result and then tried to move forward with an AMS loss func:on from that point » In principle, the AMS approximate func4on is derivable (hmp://:nyurl.com/ov5pedq) at a node level (s and b being the totals of other nodes, considered constant, and x, w being the probability predic:on and weight for the node to be split) and one could rewrite the part of code where the objec:ve func:on is evaluated, replacing the sums with a different calcula:on » (Giulio Casa)
1
2
3 4 5
Data science as a new fron:er for design A. Kazakci, ICED’15 (submimed)
DKCP process: Linearising C-K dynamics20!
Akın O. Kazakçı, MINES ParisTech!
Proven methodology: -‐ Developped at Mines ParisTech (TMCI) with RATP and Thalès Avionics -‐ 40+ KCP by researchers (2002-‐2014) -‐ 2 PhD Projects (Arnoux, 2013; Klasing Chen, in process) -‐ Now, a network of specialist consultants
Ini4alisa4on [K] Knowledge
sharing Workshops
[P] Project building
[C] IFM-‐Design Workshops
[RUN]
Try it! -‐ Red Bull Gravity Challenge
You are a designer and you have been asked to produce the most crea:ve solu:on to the following ques:on: Ensure that a hen's egg dropped from a height of 10m does not break.”
Agogué
©.
Being innova:ve: how easy is that?
Your turn!
Experiments with 210 subjets (842 proposi/ons)
“Fixa4on effects” Three types of solu:ons :
Slowing the fall Protec:ng the egg Dumping the schock covers 81 % results!
Fixa:ons on an objects iden:ty
You got anything beKer ???
Determining expansive path using C-K reasoning Determining fixation path using C-K reasoning
Theory-driven experiments – SIG Design Theory 2012 – M.Cassotti & M.Agogué
C space K space
Expanding both in the C-space and in the K-space for the “egg” task
Result 1 : the paths identified as fixation paths using C-K theory are the ones within the fixation effect for adults
Theory-driven experiments – SIG Design Theory 2012 – M.Cassotti & M.Agogué
(1) Natural distribution of solutions of a design task
Types of « fixation » based on C-K theory25!
Akın O. Kazakçı, MINES ParisTech!
Cogni:ve fixa:ons
Social fixa:ons
Limits of traditional methods for collective creativity
Consensus& Shared understanding
Originality
Participative Seminars
Creative Commandos
è Classical methods do not allow generating concepts that are both breakthrough and shared!
Fixa:on Phenomena
Isola:
on Phe
nomen
a
26!
Akın O. Kazakçı, MINES ParisTech!
DKCP : Organising for shared breakthrough projects
Consensus& Shared understanding
Originality
Fixa:on Phenomena
Isola:
on Phe
nomen
a A method for steering breakthrough process
27!
Akın O. Kazakçı, MINES ParisTech!
DKCP process: Linearising C-K dynamics28!
Management of the cogni4ve and social aspects (KCP facilitators)
Innova4on effort (Par:cipants; 20-‐50)
D
K C
P Pré-‐C
Pré-‐K
Project organisa:on
Defining and pre-‐explora:on of K pockets
Sharing and integra:ng K
Orienta:on of phase C
Guided crea:vity
Building ac:onnable strategies
Akın O. Kazakçı, MINES ParisTech!
Ini4alisa4on [K] Knowledge
sharing Workshops
[P] Project building
[C] IFM-‐Design Workshops
[RUN]
Thank you!
Disclaimer: Copyrights of images belong to their respective owners.
29!
Akın O. Kazakçı, MINES ParisTech!
Akın O. Kazakçı [email protected]
Feel free to contact me for more: