meetup data-science ovh
TRANSCRIPT
Data Organization & Big Data Architecture
Data Organization
Big Data Architecture
Recruitment
Agenda
Data Organization
Line Of Business
HR Finance Sales Customers
Competitors Markets Products Supply
Trafic Acquisition Communication Security Prospects
* If you read this text, work in the data field and are interested in joining us, please go to: https://www.ovh.com/fr/careers/
Use Line Of Business
LOB 1( Customer )
BI Team
DataScience
Team
LOB 2( Support )
BI Team
DataScience
Team
LOB 3…
BI Team
DataScience
Team
Data Office
Data Centralization
Datalake
Cleansing
Data Integration
Data Office
CRM
BI Team
Data Science Team
• ExtractsData Analyst
•Events•Actions
Customer Animation
•Product Analysis•Global AnalysisBUS•Country AnalysisSUBS
•PAC•Analyse Adhoc
Digital
•Onsite•PartnerBIZDEV
•Campaigns•Text mining
Trafic Acquistion
•Segmentation•Normalisation
Targeting Channel
In c
ase
you
miss
ed it
on
the
prev
ious
slid
e, if
you
wor
k in
the
data
fiel
d,w
e ar
e in
tere
sted
in y
our p
rofil
e!
Data Maturity
Level 1: POC
Data are manually created or extracted once
Data are modified by one data scientist
Data are assessed by a data analyst and manually sent to a business analyst post control
Data Maturity
Level 1: POC
Data are manually created or extracted once
Data are modified by one data scientist
Data are assessed by a data analyst and manually sent to a business analyst post control
Level 2: Manual
Data are manually created on a regular basis
Data are manually added to the enterprise model with an automated process
Data can be used by all data scientists, data analysts or business analysts
Data Maturity
Level 1: POC
Data are manually created or extracted once
Data are modified by one data scientist
Data are assessed by a data analyst and manually sent to a business analyst post control
Level 2: Manual
Data are manually created on a regular basis
Data are manually added to the enterprise model with an automated process
Data can be used by all data scientists, data analysts or business analysts
Level 3: Automatic
Data are created through a controlled business process
Data are automatically added to the enterprise model
Data can be used by all data scientists, data analysts or business analysts
Data Maturity MatrixCustomers Competitors Products
Advanced 5 Potential Strategy
4 Attrition New Product
3 Churn Rank
2 Adds Event
Basic 1 NIC Pricing …
Exploration : Code First Industrialisation : Model first
Data Scientists
Data Analysts
Business Analysts
AnalyseTest
Validation
Data Management Team ( Architect + Data Integrator )
Business Intelligence Team
Data Lake Team
Data Lake Team
Tool / Infrastructure
Exploration : Code First Industrialisation : Model first
Data Scientists
Data Analysts
Business Analysts
Technical model
AnalyseTest
Validation
Data Management Team ( Architect + Data Integrator )
Business Intelligence Team
Tool / Infrastructure
Exploration : Code First Industrialisation : Model first
Data preparation : 80%
Data Scientists
Data Analysts
Business Analysts
Technical model
Machine Learning :
20%
AnalyseTest
Validation
Data Management Team ( Architect + Data Integrator )
Business Intelligence Team
Data Lake Team
Tool / Infrastructure
Exploration : Code First Industrialisation : Model first
Data preparation : 80%
Data Scientists
Data Analysts
Business Analysts
Technical model
Machine Learning :
20%
AnalyseTest
Validation
Data Analysis / Creation
Data Analysis
Data Management Team ( Architect + Data Integrator )
DataViz
Model
Business Intelligence Team
POC
Expose
POC
POC Mode
Data Lake Team
Tool / Infrastructure
Exploration : Code First Industrialisation : Model first
Data preparation : 80%
Data Scientists
Data Analysts
Business Analysts
Technical model
Machine Learning :
20%
AnalyseTest
Validation
Data Analysis / Creation
Data Analysis
Data
Com
mite
e
Data Management Team ( Architect + Data Integrator )
DataViz
Model
Enterprise Model BuildingDatamart and report
building
Business Intelligence Team
DTMData Prepare: industrialise
POC
Datastore 360
Level 2 & 3mode
Expose
POC
Entreprise model
POC Mode
Data Lake Team
Tool / Infrastructure
Exploration : Code First Industrialisation : Model first
Data preparation : 80%
Data Scientists
Data Analysts
Business Analysts
Technical model
Machine Learning :
20%
AnalyseTest
Validation
Data Analysis / Creation
Data Analysis
Data
Com
mite
e
Data Management Team ( Architect + Data Integrator )
DataViz
Model
Enterprise Model BuildingDatamart and report
building
Business Intelligence Team
DTMData Prepare: industrialise
Build Datamart and Dashboard
POC
Datastore 360
Expose
POC
Entreprise model
POC Mode
Level 2 & 3mode
Data Lake Team
Data Commitee
Define data that needs to be added to enterprise data
Define priority and owners by subject Industrialise New data production : from
excel to full business process Validate enterprise model
– Common vocabulary– Business and/or Functional model
Be informed of evolution
Participant
Data Scientist Data Analyst Business Analyst Data Management Team
Periodicity
Every month
Objectives
Datastore 360
EDS 360
History
Get all data from– Front office application– Back Office Application– External Data
Stores data in a business oriented model Responsable to historize data when this makes
sense for the business– What data do we want to keep ? What will I need in 20 years ?
Expose data to all application that requires it– Business Intelligence : reporting or datamart – Front office Application
Current
Client Produit Activity
Client Produit Activity
…
…
Data Scientist
Data Analyst
Business Analyst
DataViz
User APPs(CRM, Support
api
api Direct read
Big Data Architecture
Context
~ 50 Replicas SQL
~ 700 DB
~ 300K tables
~ 100TB
~ 500K events/s
Datalake Hardware view
Private network
OVH Dedicated server
OVH Public Cloud High scalability
Security
Performance
Reliability
Lille Grand Palais – 28 Février 2017
Datalake software view
Pig
Flink
Spark
HDFS
HBase
Phoenix
Kafka (Queue)CouchB
ase
Jobs
Job Skills Output
Data Analyst ExcelDataviz : Tableau, PowerBI
Data strategy
Data Scientist Scala, Java, R, Python, Cube Datasets, Flows, Patterns, Models
Data Integrator Flink, Hbase, Pig, Spark Data preparation
Data Dev Ops Kafka, Hbase, Go, Apache Beam, …
Datalake
Thank you !
Join us : ovh.com/fr/careers