machine learning and data science develop innovative

35
We help industry to design and develop innovative solutions leveraging Machine Learning and Data Science Expert ou machine ? Les deux !

Upload: others

Post on 12-Dec-2021

1 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Machine Learning and Data Science develop innovative

We help industry to design and develop innovative solutions leveraging

Machine Learning and Data Science

Expert ou machine ? Les deux !

Page 2: Machine Learning and Data Science develop innovative

Who we are

Guillaume Forcade

CEO & Co-founder

Jean-Baptiste Pautrizel

PhD Physics & Co-founder

Adrien Todeschini

PhD Stat ML & Chief Data scientist

Kévin Baudin

MSc. Stat Finance & Data scientist

Wassek Al Chahid

Full stack developer

Page 3: Machine Learning and Data Science develop innovative

What we do

Credit scoring

Recommender systems

Betting fraud detection

Traffic prediction

Advertising targeting

Genomics meta-analysis

Buy/Sell signals

Algo trading

Technical/fundamental analysis

Commodities

GlobalWineScore.com

Noodle.vote

NIPS / ICML

Data Science

Consulting

Quant Finance

ResearchThe Lab

Page 4: Machine Learning and Data Science develop innovative

What we useData Engineering

Data collection and storage

Data transformation

Data visualization

Machine Learning and Statistics

Supervised learning: classifiers, regressorsUnsupervised learning: dimensionality reduction, clusteringTrees and ensemble methodsTopic modelingLatent factor modelsMatrix factorizationBayesian inferenceProbabilistic programmingNon parametric statisticsMissing data imputationDecision theoryMonte-Carlo simulationScore aggregation

Programming

Languages and tools

Web development

Cloud computing

Page 5: Machine Learning and Data Science develop innovative

GlobalWineScore.com

Page 6: Machine Learning and Data Science develop innovative
Page 7: Machine Learning and Data Science develop innovative

Genomics Meta-analysis

Page 8: Machine Learning and Data Science develop innovative

WineRec

Page 9: Machine Learning and Data Science develop innovative

Nips17.ml

Page 10: Machine Learning and Data Science develop innovative

Noodle.vote

Page 11: Machine Learning and Data Science develop innovative

Noodle.vote

Page 12: Machine Learning and Data Science develop innovative

Market predictions

Page 13: Machine Learning and Data Science develop innovative

Algo Trading

Page 14: Machine Learning and Data Science develop innovative

Preacor.fr

Page 15: Machine Learning and Data Science develop innovative

Preacor.fr

Page 16: Machine Learning and Data Science develop innovative

Preacor.fr

Page 17: Machine Learning and Data Science develop innovative
Page 18: Machine Learning and Data Science develop innovative

Data ~8 ans d’historique, ~35000 dossiers, ~70 colonnes

Dossiernumero_dossierdate_creationetape_dossier

nb_emp

Revenusrevenu_allocrevenu_mensuel

catprotype_contratemployeur

anciennete_job

Créditsnb_prets_immocrd_prets_immomensu_prets_immonb_prets_consocrd_prets_consomensu_prets_conso

Chargescharge_pensioncharge_impotscharge_loyer

charge_garde_enfantscharge_autre

charge_autre_nature

(Co)Emprunteurcp, villenaissance

isproprio, isloc, ishebergesitu

nbenfantsbanque

anciennete_banque

Patrimoinemnt_epargnenb_biens

valeur_biensrevenu_biens

Projetcp_bien_projet

ville_bien_projetsurface_bien_projetterrain_bien_projet

usage_projetisprimoaccedant

Financementmnt_terrainmnt_logementmnt_travaux

mnt_frais_agencemnt_frais_notairehonos_courtiermnt_apportmnt_financebanque_pret

nb_prets_differe_partielduree_mois_financement

refus_motif

Page 19: Machine Learning and Data Science develop innovative

Variables absentes

● Type profession libérale● Comportement bancaire : solde moyen, découverts● Taille de l’entreprise● Liasses fiscales entrepreneurs● Faire appel à un constructeur● Assurance dommage ouvrage

Page 20: Machine Learning and Data Science develop innovative

Variable à prédire

Page 21: Machine Learning and Data Science develop innovative

● type_op = Achat ancien avec travaux, Achat ancien sans travaux, Achat neuf clés en main, Achat neuf en VEFA, Achat terrain + construction

● refus_motif = Fonctionnement des comptes, Taux endettement, Reste à vivre, Hors critères, Prime trop élevée● On obtient ~13000 dossiers, ~8% de refus (Déséquilibre)

Filtrage des cas pertinents

Page 22: Machine Learning and Data Science develop innovative

Pipeline

Import

Clean

Transform Explore

Impute

Model

Evaluate

Page 23: Machine Learning and Data Science develop innovative

Nettoyage

Page 24: Machine Learning and Data Science develop innovative

Transformation

Feature engineering

● Pourcentage d’endettement● Reste à vivre par personne● Valeur patrimoine net● Revenu annuel net total● Pourcentage des travaux sur prix du bien● Pourcentage d’apport sur coût du projet● Epargne après opération sur coût du projet

Page 25: Machine Learning and Data Science develop innovative

Imputation

Page 26: Machine Learning and Data Science develop innovative

Problèmes rencontrés

● Dossiers non pertinents → écartés

● Données aberrantes → seuillées

● Données non typées, mal encodées → parsing

● Données manquantes → imputation

● Classes déséquilibrées → oversampling

● Variables absentes ou sous-représentées → modèle bayésien

Page 27: Machine Learning and Data Science develop innovative

Inférence bayésienne

Page 28: Machine Learning and Data Science develop innovative

Régression logistique bayésienneVraisemblance

A priori

A posteriori

Page 29: Machine Learning and Data Science develop innovative

JAGS codemodel {

# likelihood for (i in 1:n) { y[i] ~ dbern(prob[i])

logit(prob[i]) <- ( beta_0 + inprod(X[i,], beta) + inprod(X_coemp[i,], beta[1:p_coemp] ) / scale[i] }

# prior beta_0 ~ dnorm(0, 1/sigma^2)

for (k in 1:n_var) { for (j in from[v]:to[v]) { beta[j] ~ dnorm(mu[j], 1/(sigma_var[k])^2) } }

}

Page 30: Machine Learning and Data Science develop innovative

Coefficients

Page 31: Machine Learning and Data Science develop innovative

Coefficients

Page 32: Machine Learning and Data Science develop innovative

Coefficients

Page 33: Machine Learning and Data Science develop innovative

Coefficients

Page 34: Machine Learning and Data Science develop innovative

Comparaison

Page 35: Machine Learning and Data Science develop innovative

[email protected]

@scorelab_io