meetup data-science ovh

25
Data Organization & Big Data Architecture

Upload: head-of-data

Post on 19-Feb-2017

207 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Meetup Data-science OVH

Data Organization & Big Data Architecture

Page 2: Meetup Data-science OVH

Data Organization

Big Data Architecture

Recruitment

Agenda

Page 3: Meetup Data-science OVH

Data Organization

Page 4: Meetup Data-science OVH

Line Of Business

HR Finance Sales Customers

Competitors Markets Products Supply

Trafic Acquisition Communication Security Prospects

* If you read this text, work in the data field and are interested in joining us, please go to: https://www.ovh.com/fr/careers/

Page 5: Meetup Data-science OVH

Use Line Of Business

LOB 1( Customer )

BI Team

DataScience

Team

LOB 2( Support )

BI Team

DataScience

Team

LOB 3…

BI Team

DataScience

Team

Page 6: Meetup Data-science OVH

Data Office

Data Centralization

Datalake

Cleansing

Data Integration

Data Office

CRM

BI Team

Data Science Team

• ExtractsData Analyst

•Events•Actions

Customer Animation

•Product Analysis•Global AnalysisBUS•Country AnalysisSUBS

•PAC•Analyse Adhoc

Digital

•Onsite•PartnerBIZDEV

•Campaigns•Text mining

Trafic Acquistion

•Segmentation•Normalisation

Targeting Channel

In c

ase

you

miss

ed it

on

the

prev

ious

slid

e, if

you

wor

k in

the

data

fiel

d,w

e ar

e in

tere

sted

in y

our p

rofil

e!

Page 7: Meetup Data-science OVH

Data Maturity

Level 1: POC

Data are manually created or extracted once

Data are modified by one data scientist

Data are assessed by a data analyst and manually sent to a business analyst post control

Page 8: Meetup Data-science OVH

Data Maturity

Level 1: POC

Data are manually created or extracted once

Data are modified by one data scientist

Data are assessed by a data analyst and manually sent to a business analyst post control

Level 2: Manual

Data are manually created on a regular basis

Data are manually added to the enterprise model with an automated process

Data can be used by all data scientists, data analysts or business analysts

Page 9: Meetup Data-science OVH

Data Maturity

Level 1: POC

Data are manually created or extracted once

Data are modified by one data scientist

Data are assessed by a data analyst and manually sent to a business analyst post control

Level 2: Manual

Data are manually created on a regular basis

Data are manually added to the enterprise model with an automated process

Data can be used by all data scientists, data analysts or business analysts

Level 3: Automatic

Data are created through a controlled business process

Data are automatically added to the enterprise model

Data can be used by all data scientists, data analysts or business analysts

Page 10: Meetup Data-science OVH

Data Maturity MatrixCustomers Competitors Products

Advanced 5 Potential Strategy

4 Attrition New Product

3 Churn Rank

2 Adds Event

Basic 1 NIC Pricing …

Page 11: Meetup Data-science OVH

Exploration : Code First Industrialisation : Model first

Data Scientists

Data Analysts

Business Analysts

AnalyseTest

Validation

Data Management Team ( Architect + Data Integrator )

Business Intelligence Team

Data Lake Team

Page 12: Meetup Data-science OVH

Data Lake Team

Tool / Infrastructure

Exploration : Code First Industrialisation : Model first

Data Scientists

Data Analysts

Business Analysts

Technical model

AnalyseTest

Validation

Data Management Team ( Architect + Data Integrator )

Business Intelligence Team

Page 13: Meetup Data-science OVH

Tool / Infrastructure

Exploration : Code First Industrialisation : Model first

Data preparation : 80%

Data Scientists

Data Analysts

Business Analysts

Technical model

Machine Learning :

20%

AnalyseTest

Validation

Data Management Team ( Architect + Data Integrator )

Business Intelligence Team

Data Lake Team

Page 14: Meetup Data-science OVH

Tool / Infrastructure

Exploration : Code First Industrialisation : Model first

Data preparation : 80%

Data Scientists

Data Analysts

Business Analysts

Technical model

Machine Learning :

20%

AnalyseTest

Validation

Data Analysis / Creation

Data Analysis

Data Management Team ( Architect + Data Integrator )

DataViz

Model

Business Intelligence Team

POC

Expose

POC

POC Mode

Data Lake Team

Page 15: Meetup Data-science OVH

Tool / Infrastructure

Exploration : Code First Industrialisation : Model first

Data preparation : 80%

Data Scientists

Data Analysts

Business Analysts

Technical model

Machine Learning :

20%

AnalyseTest

Validation

Data Analysis / Creation

Data Analysis

Data

Com

mite

e

Data Management Team ( Architect + Data Integrator )

DataViz

Model

Enterprise Model BuildingDatamart and report

building

Business Intelligence Team

DTMData Prepare: industrialise

POC

Datastore 360

Level 2 & 3mode

Expose

POC

Entreprise model

POC Mode

Data Lake Team

Page 16: Meetup Data-science OVH

Tool / Infrastructure

Exploration : Code First Industrialisation : Model first

Data preparation : 80%

Data Scientists

Data Analysts

Business Analysts

Technical model

Machine Learning :

20%

AnalyseTest

Validation

Data Analysis / Creation

Data Analysis

Data

Com

mite

e

Data Management Team ( Architect + Data Integrator )

DataViz

Model

Enterprise Model BuildingDatamart and report

building

Business Intelligence Team

DTMData Prepare: industrialise

Build Datamart and Dashboard

POC

Datastore 360

Expose

POC

Entreprise model

POC Mode

Level 2 & 3mode

Data Lake Team

Page 17: Meetup Data-science OVH

Data Commitee

Define data that needs to be added to enterprise data

Define priority and owners by subject Industrialise New data production : from

excel to full business process Validate enterprise model

– Common vocabulary– Business and/or Functional model

Be informed of evolution

Participant

Data Scientist Data Analyst Business Analyst Data Management Team

Periodicity

Every month

Objectives

Page 18: Meetup Data-science OVH

Datastore 360

EDS 360

History

Get all data from– Front office application– Back Office Application– External Data

Stores data in a business oriented model Responsable to historize data when this makes

sense for the business– What data do we want to keep ? What will I need in 20 years ?

Expose data to all application that requires it– Business Intelligence : reporting or datamart – Front office Application

Current

Client Produit Activity

Client Produit Activity

Data Scientist

Data Analyst

Business Analyst

DataViz

User APPs(CRM, Support

api

api Direct read

Page 19: Meetup Data-science OVH

Big Data Architecture

Page 20: Meetup Data-science OVH

Context

~ 50 Replicas SQL

~ 700 DB

~ 300K tables

~ 100TB

~ 500K events/s

Page 21: Meetup Data-science OVH

Datalake Hardware view

Private network

OVH Dedicated server

OVH Public Cloud High scalability

Security

Performance

Reliability

Page 22: Meetup Data-science OVH

Lille Grand Palais – 28 Février 2017

Page 23: Meetup Data-science OVH

Datalake software view

Pig

Flink

Spark

HDFS

HBase

Phoenix

Kafka (Queue)CouchB

ase

Page 24: Meetup Data-science OVH

Jobs

Job Skills Output

Data Analyst ExcelDataviz : Tableau, PowerBI

Data strategy

Data Scientist Scala, Java, R, Python, Cube Datasets, Flows, Patterns, Models

Data Integrator Flink, Hbase, Pig, Spark Data preparation

Data Dev Ops Kafka, Hbase, Go, Apache Beam, …

Datalake

Page 25: Meetup Data-science OVH

Thank you !

Join us : ovh.com/fr/careers