ramp: collaborative challenge with code submission

36
Center for Data Science Paris-Saclay 1 CNRS & University Paris Saclay TektosData BALÁZS KÉGL RAPID ANALYTICS AND MODEL PROTOTYPING (RAMP) COLLABORATIVE CHALLENGE WITH CODE SUBMISSION

Upload: balazs-kegl

Post on 22-Jan-2018

373 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-Saclay1

CNRS & University Paris Saclay TektosData

BALÁZS KÉGL

RAPID ANALYTICS AND MODEL PROTOTYPING (RAMP)

COLLABORATIVE CHALLENGE WITH CODE SUBMISSION

Page 2: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-SaclayCenter for Data ScienceParis-Saclay2

A bit of context

Page 3: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-Saclay3

UNIVERSITÉ PARIS-SACLAY

19 founding partners

Page 4: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-SaclayCenter for Data ScienceParis-Saclay4

UNIVERSITÉ PARIS-SACLAY

+ horizontal multi-disciplinary and multi-partner initiatives to create cohesion

Page 5: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-SaclayCenter for Data ScienceParis-Saclay5

Center for Data ScienceParis-Saclay

A multi-disciplinary initiative to define, structure, and manage the data science ecosystem at the Université Paris-Saclay

http://www.datascience-paris-saclay.fr/

Biology & bioinformaticsIBISC/UEvry LRI/UPSudHepatinovCESP/UPSud-UVSQ-Inserm IGM-I2BC/UPSud MIA/AgroMIAj-MIG/INRALMAS/Centrale

ChemistryEA4041/UPSud

Earth sciencesLATMOS/UVSQ GEOPS/UPSudIPSL/UVSQLSCE/UVSQLMD/Polytechnique

EconomyLM/ENSAE RITM/UPSudLFA/ENSAE

NeuroscienceUNICOG/InsermU1000/InsermNeuroSpin/CEA

Particle physics astrophysics & cosmologyLPP/Polytechnique DMPH/ONERACosmoStat/CEAIAS/UPSudAIM/CEALAL/UPSud

The Paris-Saclay Center for Data ScienceData Science for scientific Data

250 researchers in 35 laboratories

Machine learningLRI/UPSud LTCI/TelecomCMLA/Cachan LS/ENSAELIX/PolytechniqueMIA/AgroCMA/PolytechniqueLSS/SupélecCVN/Centrale LMAS/CentraleDTIM/ONERAIBISC/UEvry

VisualizationINRIALIMSI

Signal processingLTCI/TelecomCMA/PolytechniqueCVN/CentraleLSS/SupélecCMLA/CachanLIMSIDTIM/ONERA

StatisticsLMO/UPSud LS/ENSAELSS/SupélecCMA/PolytechniqueLMAS/CentraleMIA/AgroParisTech

Data sciencestatistics

machine learninginformation retrieval

signal processingdata visualization

databases

Domain sciencehuman society

life brain earth

universe

Tool buildingsoftware engineering

clouds/gridshigh-performance

computingoptimization

Data scientist

Applied scientist

Domain scientist

Data engineer

Software engineer

Center for Data ScienceParis-Saclay

datascience-paris-saclay.fr

@SaclayCDS

LIST/CEA

Page 6: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-Saclay

Data domainsenergy and physical sciences

health and life sciences Earth and environment

economy and society brain

Data scientist

Data trainer

Applied scientist

Domain scientistSoftware engineer

Data engineer

Data sciencestatistics

machine learning information retrieval

signal processing data visualization

databases

Tool building software engineering

clouds/grids high-performance

computing optimization

• (The lack of) manpower

• especially at the interfaces

• industrial brain-drain

• Incentives

• data scientists are not incentivized to work on domain science

• scientists are not incentivized to work on tools

• Access

• no well-developed channels to identify the right experts for a given problem

• Tools

• few tools that can help domain scientists and data scientists to collaborate efficiently

6

CHALLENGEShttps://medium.com/@balazskegl

Page 7: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-SaclayCenter for Data ScienceParis-Saclay7

TWO ANALYTICS TOOLS FOR INITIATING DOMAIN-DATA SCIENCE INTERACTIONS

RAPID ANALYTICS AND MODEL PROTOTYPING

(RAMP)

DATA CHALLENGES

Page 8: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-SaclayCenter for Data ScienceParis-Saclay8

DATA CHALLENGES

Page 9: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-SaclayCenter for Data ScienceParis-Saclay9

DATA CHALLENGES

• The HiggsML challenge on Kaggle

• https://www.kaggle.com/c/higgs-boson

Page 10: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-SaclayCenter for Data ScienceParis-Saclay10

HUGE PUBLICITY B. Kégl / AppStat@LAL Learning to discover

CLASSIFICATION FOR DISCOVERY

14

Page 11: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-SaclayCenter for Data ScienceParis-Saclay11

SIGNIFICANT IMPROVEMENT OVER THE BASELINE

B. Kégl / AppStat@LAL Learning to discover

CLASSIFICATION FOR DISCOVERY

15

Page 12: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-SaclayCenter for Data ScienceParis-Saclay12

HUGE PUBLICITY

SIGNIFICANT IMPROVEMENT OVER THE BASELINE

yet partially missing the objectives

Page 13: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-Saclay

• Challenges are useful for

• generating visibility in the data science community about novel application domains

• benchmarking in a fair way state-of-the-art techniques on well-defined problems

• finding talented data scientists

• Limitations

• not necessary adapted to solving complex and open-ended data science problems in realistic environments

• no direct access to solutions and data scientist

• emphasizes competition

13

DATA CHALLENGES

Page 14: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-SaclayCenter for Data ScienceParis-Saclay14

HUGE PUBLICITY

We decided to design something better

Page 15: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-Saclay

• Prototyping

• Training

• Human resources

• Collaboration building, networking

• Social science observatory

15

RAPID ANALYTICS AND MODEL PROTOTYPING (RAMP)

Page 16: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-Saclay

RAMPS

16

• Single-day coding sessions

• 20-40 participants

• preparation is similar to challenges

• Goals

• focusing and motivating top talents

• promoting collaboration, speed, and efficiency

• solving (prototyping) real problems

Page 17: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-Saclay17

ANALYTICS TOOLS TO PROMOTE COLLABORATION AND CODE REUSE

Page 18: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-SaclayCenter for Data ScienceParis-Saclay18

ANALYTICS TOOL TO PROMOTE COLLABORATION AND CODE REUSE

Page 19: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-Saclay

RAMPS

19

www.ramp.studiosoftware + management

backend is open source: https://github.com/camillemarini/datarun

Page 20: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-SaclayCenter for Data ScienceParis-Saclay

2015 Jan 15 The HiggsML challenge

20

RAPID ANALYTICS AND MODEL PROTOTYPING

Page 21: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-SaclayCenter for Data ScienceParis-Saclay

2015 Apr 10 Classifying variable stars

21

RAPID ANALYTICS AND MODEL PROTOTYPING

Page 22: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-SaclayCenter for Data ScienceParis-Saclay22

VARIABLE STARS

Page 23: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-SaclayCenter for Data ScienceParis-Saclay23

VARIABLE STARS

accuracy improvement: 89% to 96%

Page 24: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-SaclayCenter for Data ScienceParis-Saclay

2015 June 16 and Sept 26 Predicting El Nino

24

RAPID ANALYTICS AND MODEL PROTOTYPING

Page 25: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-SaclayCenter for Data ScienceParis-Saclay25

RMSE improvement: 0.9˚C to 0.4˚C

RAPID ANALYTICS AND MODEL PROTOTYPING

Page 26: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-SaclayCenter for Data ScienceParis-Saclay26

2015 October 8 Insect classification

RAPID ANALYTICS AND MODEL PROTOTYPING

Page 27: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-SaclayCenter for Data ScienceParis-Saclay27

accuracy improvement: 30% to 70%

RAPID ANALYTICS AND MODEL PROTOTYPING

Page 28: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-SaclayCenter for Data ScienceParis-Saclay28

2016 February 10 Macroeconomic agent-based models

RAPID ANALYTICS AND MODEL PROTOTYPING

Page 29: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-SaclayCenter for Data ScienceParis-Saclay29

f1-score improvement: 0.57 to 0.63

RAPID ANALYTICS AND MODEL PROTOTYPING

Page 30: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-SaclayCenter for Data ScienceParis-Saclay30

2016 February 13 Epidemium cancer survival rate

RAPID ANALYTICS AND MODEL PROTOTYPING

Page 31: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-SaclayCenter for Data ScienceParis-Saclay31

RMSE improvement: 3000 to 300

RAPID ANALYTICS AND MODEL PROTOTYPING

Page 32: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-SaclayCenter for Data ScienceParis-Saclay32

2016 May 11 Drug identification from spectra

RAPID ANALYTICS AND MODEL PROTOTYPING

Page 33: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-SaclayCenter for Data ScienceParis-Saclay33

Drug identification error improvement: 9% to 3%

Drug concentration accuracy improvement: 20% to 12%

RAPID ANALYTICS AND MODEL PROTOTYPING

Page 34: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-Saclay

• Fast development of analytics solutions

• Teaching support

• Networking and HR support

• Support for collaborative team work

• Commercialized through TektosData

34

THE RAMP TOOL

A prototyping tool for collaborative development of data science workflows

Page 35: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-Saclay

• We have a cool tool for collaborative data analytics

• designing workflows beyond scikit-learn predictors

• Data management/munging is a big part of the data analytics workflow, we need tools

• preparing a RAMP takes two weeks to six months

• Big data is rare: our problems are more about flexible organization of heterogeneous data

35

TAKE HOME MESSAGES

Page 36: RAMP: Collaborative challenge with code submission

Center for Data ScienceParis-Saclay36

THANK YOU!