capgemini leap data transformation framework with cloudera

29
1 © Cloudera, Inc. All rights reserved. Copyright © Capgemini 2016. All Rights Reserved. | Is your Big Data journey stalling? Take the Leap with Capgemini and Cloudera Industrializing your transition to the Modern Data Landscape |

Upload: capgemini

Post on 16-Apr-2017

1.865 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Capgemini Leap Data Transformation Framework with Cloudera

1© Cloudera, Inc. All rights reserved.

Copyright © Capgemini 2016. All Rights Reserved.

|

Is your Big Data journey stalling?Take the Leap with Capgemini and ClouderaIndustrializing your transition to the Modern Data Landscape

|

Page 2: Capgemini Leap Data Transformation Framework with Cloudera

2© Cloudera, Inc. All rights reserved.

Copyright © Capgemini 2016. All Rights Reserved.

|

Speakers

Andrea Capodicasa

Senior Solution Architect

Insights & Data

Goutham Belliappa

Big Data practice leader

Insights & Data

Alex Gutow

Senior Manager,

Product Marketing

Page 3: Capgemini Leap Data Transformation Framework with Cloudera

3© Cloudera, Inc. All rights reserved.

Copyright © Capgemini 2016. All Rights Reserved.

|

Agenda

• The Case for Change

• Industrializing the Change

• Adoption

• Q&A

Page 4: Capgemini Leap Data Transformation Framework with Cloudera

4© Cloudera, Inc. All rights reserved.

Copyright © Capgemini 2016. All Rights Reserved.

|Capgemini Insights & Data Global PracticeGlobal reach with over 13,000 professionals across 40+ countries

with over 500 Big Data & Data

Science professionals, including

100+ Hadoop certified

consultants

with over 500 Big Data & Data

Science professionals, including

100+ Hadoop certified

consultants

We employ >13,000 information

management specialist

practitioners, deployed across

Capgemini’s global network

We employ >13,000 information

management specialist

practitioners, deployed across

Capgemini’s global network

We were recognised again by

Gartner as one of the 4 leading

information service providers

globally

We were recognised again by

Gartner as one of the 4 leading

information service providers

globally

Capgemini Insights & Data Global

Practice since 2015, delivering

business & IT Insights and data

services

Capgemini Insights & Data Global

Practice since 2015, delivering

business & IT Insights and data

services

Capgemini has a global reach and

local presence in 44 Countries and

over 100 Languages

Capgemini has a global reach and

local presence in 44 Countries and

over 100 Languages

Page 5: Capgemini Leap Data Transformation Framework with Cloudera

5© Cloudera, Inc. All rights reserved.

Copyright © Capgemini 2016. All Rights Reserved.

|

The case for change

Page 6: Capgemini Leap Data Transformation Framework with Cloudera

6© Cloudera, Inc. All rights reserved.

Copyright © Capgemini 2016. All Rights Reserved.

|Information Trends: What are seeing in the market place?

Recent years have brought unprecedented changes to the Information landscape. Each of these “disruptors” have

individual momentum and collectively represent significant opportunity to improve

an organization’s effectiveness.

Successful CIOs and leaders consciously take these trends into consideration when planning

the evolution of their information architecture.

Empower the business by focusing from the “user down”, not the “system up”.

Modeling business requirements months or even years

in advance and IT delivering a multi year plan to rollout

a solution that may not apply in a fast changing

business environment are long gone

Ms. Agility killed Mr. Waterfall

The availability of “finished” business functions within

the cloud provides organizations with tremendous

opportunities while increasing IT information

challenges

Cloud Computing

Open source architecture provides substantial

development and complexity cost savings vs. legacy

software packages.

Open Source

Software as a Service offerings in Big Data,

Data Transformation & finished analytics are removing

the infrastructure bottle necks of servers, software and

maintenance from obstructing

speed to market

As a Service

The proliferation of web-connected IP devices creates

a “hyper-evolving” cyber breach potential for

organizations; privacy laws create compliance

challenges with mobile devices

Security & Privacy

Traditionally data dictionaries have been single

purpose and technically focused. As data becomes

more valuable and the same information is used in

multiple ways, then the need for Business Meta-data

will become critical

Business Meta-Data

Has resulted in data where segments are loosely

connected and correlations are at times

non-intuitive, requiring new ways to mine

and derive insights

Social Computing

Massive in-memory databases with intensely complex

analytics are highly scalable -- change anything,

anytime, and simultaneously compare the results of

multiple scenarios in seconds

In Memory Analytics

Describes the transition from historical or hind-sight

indicators to insight and foresight indicators and

visualizations.

“Real” Analytics

Page 7: Capgemini Leap Data Transformation Framework with Cloudera

7© Cloudera, Inc. All rights reserved.

Copyright © Capgemini 2016. All Rights Reserved.

|Customers are Looking for a Guide

Page 8: Capgemini Leap Data Transformation Framework with Cloudera

8© Cloudera, Inc. All rights reserved.

Copyright © Capgemini 2016. All Rights Reserved.

|

Cloudera Enterprise Making Hadoop Fast, Easy, and Secure

A new kind of data

platform

• One place for unlimited

data

• Unified, multi-

framework data access

Cloudera makes it

• Fast for business

• Easy to manage

• Secure without

compromisePublic CloudPrivate CloudHybrid Environments

Hybrid Deployment Flexibility

OPERATIONSDATA

MANAGEMENT

STRUCTURED UNSTRUCTURED

PROCESS, ANALYZE, SERVE

UNIFIED SERVICES

RESOURCE MANAGEMENT SECURITY

NoSQL

STORE

INTEGRATE

BATCH STREAM SQL SEARCH OTHER

OTHERFILESYSTEM RELATIONAL

Page 9: Capgemini Leap Data Transformation Framework with Cloudera

9© Cloudera, Inc. All rights reserved.

Copyright © Capgemini 2016. All Rights Reserved.

|The traditional approach to BI & Analytics is a bottleneck in the operational value chain

Traditional BI & Analytics approachTraditional BI & Analytics approach • Centralised BI teams too monolithic and divorced

from the business operations

• Insights latency

• Reporting on the past, limited ability to predict

and prescribe what is needed now

• Each new business question asked = more time

required to crunch the right data

• Heavy duplication in operational data throughout

the BI layers & systems

• Diluted data quality & governance create risks of

security breach, compliance issues & risk exposure

• Significant costs – infrastructure and people.

• Limited ability to scale - either from organic data

volumes growth or increasing data complexity

Page 10: Capgemini Leap Data Transformation Framework with Cloudera

10© Cloudera, Inc. All rights reserved.

Copyright © Capgemini 2016. All Rights Reserved.

|The Insights-driven enterprise puts information at the centreand insights “at the point of action”

Next Generation approachNext Generation approach • Next-generation data management platform enabling a

pervasive, real-time “insights & data fabric” serving

operations

• Standardized & cost effective data management, allowing

high agility on insights and the ability to “ask any

questions”

• Operational applications provide data and integrate

insights back in a continuous improvement loop

• Operations integrate predicted best outcomes to optimise

business processes, automatically where possible

• Ability to detect and catch events on the fly that will

require immediate action (e.g. fraud detection) for

optimal reaction or proactive action

• Coherent management of platforms & data management

processes, with insights & data science skills embedded

directly in the operational units for maximum impact

• Optimized total cost of ownership (TCO) with a

rationalized and simplified data landscape

Page 11: Capgemini Leap Data Transformation Framework with Cloudera

11© Cloudera, Inc. All rights reserved.

Copyright © Capgemini 2016. All Rights Reserved.

|

OP

ER

AT

ION

S

DA

TA

MA

NA

GE

ME

NT

UNIFIED SERVICES

PROCESS, ANALYZE, SERVE

STORE

INTEGRATE

Key challenges blur the vision on both the target andthe journey to the Insights-driven enterprise

Challenges addressed

“Which data should we

retain and/or which data

could we archive?”

“I don’t know how to

drive value from my

data”

“Can I decrease costs by

moving my data

(landscape) to the cloud

or As-A-Service”

“How mature is my data

landscape in comparison

to the best industrial

trends?”

“I have been told to“

do something” about big

data analytics but don’t

know where to start”

“Can the Business

Intelligence landscape be

optimized to derive the

maximum value out of it?”

“Our data landscape is

scattered, complex and

very expensive, can we

fix it?”

Value created

A modern data strategy will enable:

� Reduced complexity: Rationalizing the

data strategy to meet demand

� Lower cost: Reduce the operating cost of

your data strategy

� Increased agility and better time to

market: More speed in the development

of new information applications

� More/Better insights and return on

intelligence: Ease to derive meaningful

insights and enable business

transformation

� Less risk: Reduce complexity of the data

strategy

� Data security & privacy: Make your data

strategy compliant with rules and

regulations

Page 12: Capgemini Leap Data Transformation Framework with Cloudera

12© Cloudera, Inc. All rights reserved.

Copyright © Capgemini 2016. All Rights Reserved.

|

Industrializing the change

Page 13: Capgemini Leap Data Transformation Framework with Cloudera

13© Cloudera, Inc. All rights reserved.

Copyright © Capgemini 2016. All Rights Reserved.

|

MisuraMisura DiligentDiligent IdemIdem BlendBlend PapillonPapillon VirtuVirtu

Capgemini’s Leap Data Transformation FrameworkModules overview

Essence

(Semantic Layer consolidation)

� Analyze existing semantic layer of architecture

� Identify potential functional overlap and produce

recommendations for consolidation

Data concierge

� Business Information Catalog

� Self service ingestion, distillation, analytics

� Data Operations Services

Estimation Discovery Design/Build Testing

� Agile environment provisioning

� Continuous Integration lifecycleOne-Click leap

� Optimize/reduce

transformation scope

� Optimize

reporting design

� Optimize SQL � Industrialize end to

end testing

� Estimate the

transformation effort

� Optimize ETL semantic

design

Page 14: Capgemini Leap Data Transformation Framework with Cloudera

14© Cloudera, Inc. All rights reserved.

Copyright © Capgemini 2016. All Rights Reserved.

|Diligent / Blend Applications

Business Problem

� Large and complex DW estates have been built over the last

20 years or, so and the infrastructure hosting them might need

update

� A number of reports and underlying tables will be duplicated

or not utilised anymore – they can be decommissioned saving

valuable resources

� Users are reluctant to give up “their” reports/data when

migrations programmes occur

Solution

� Scope reduction through identifying current BO reports that are not used. Up to 40% discovered with a customer of ours

� Scope reduction in identifying reports that are duplicates or share a number of data items.

� Automated method to migrate BO reports to Pentaho, hence reduced workload and reduced errors.

� A scientific and objective approach to measure which data are

actually used

� Diligent BO Audit data explorer to identify interactions

between users and Universes / Reports and tables

� Diligent BO Meta data gathering Module to extract Universe

and report information.

� Blend Report merger to identify reports reduction

� Blend XML Generator to create Pentaho reporting cubes from

Diligent gathered metadata.

DiligentDiligent BlendBlend

Accelerator Results

Page 15: Capgemini Leap Data Transformation Framework with Cloudera

15© Cloudera, Inc. All rights reserved.

Copyright © Capgemini 2016. All Rights Reserved.

|IDEM-DA

Business Problem

� The customer has very strict security and normalisation

requirements when loading their data, they need different

obfuscation types for different “semantic types pre” e.g.

names, phone numbers, social security numbers. Etc.

� Left it as a manual activity, this would imply a laborious and

time consuming identification of hundred of thousands of

columns – a costly and error prone activity

Solution

� Automated identification of tables columns for encryption,

and standardisation

� Automated creation of ETL meta-data spreadsheets which

drive Data Acquisitions Pentaho jobs for data migration

Accelerator Results

� Manual generation of meta-data

spreadsheet: Several Days - Weeks

� IDEM-DA: 15mins - 2 hours

� Manual eyeballing of data – human errors.

Can take hours to several days

� IDEM-DA: Approximately 70% reduction

and more accurate identification of known

types

Project manager of Data Migration

project: “IDEM-DA is the only way

forward”

IdemIdem

Page 16: Capgemini Leap Data Transformation Framework with Cloudera

16© Cloudera, Inc. All rights reserved.

Copyright © Capgemini 2016. All Rights Reserved.

|

Example table

IDEM-DA

Column Name Dataset

mob_no 07710232931,07083210302

email [email protected],

[email protected]

free_text_field My address is 12 lucky street,

London, E12 2TF

serial_id 11234, 22313, 3231313

Semantic Type

MOBILE_NO

EMAIL

Address

UNKNOWN

IDEM-DA

IDEM-DA is a Module used to support the ETL from legacy data warehouses into Modern architecture

IdemIdem

Page 17: Capgemini Leap Data Transformation Framework with Cloudera

17© Cloudera, Inc. All rights reserved.

Copyright © Capgemini 2016. All Rights Reserved.

|IDEM-ES

Business Problem

� The customer has a load pattern called “cutover+delta” –

historical tables are updated with daily files

� Although many tables have most of the columns with

similar names, Left it as a manual activity, this would

imply a time consuming identification of hundred of

thousands of columns – a error prone activity

Solution

� Machine learning based solution to automatically identify

similarity between columns (humanly supervised)

� Column name similarity (ngrams)

� Column content similarity (ngrams)

� Column content agnostic distribution (hist)

� Open architecture to automatically evaluate best

model (tested 600+)

� Automated creation of INSERT INTO ETL scripts

Accelerator Results

- Acceleration expected around 30-50%� Can automatically generate SQL insert statements to create

the current view

IdemIdem

Page 18: Capgemini Leap Data Transformation Framework with Cloudera

18© Cloudera, Inc. All rights reserved.

Copyright © Capgemini 2016. All Rights Reserved.

|IDEM-ES

IdemIdem

Page 19: Capgemini Leap Data Transformation Framework with Cloudera

19© Cloudera, Inc. All rights reserved.

Copyright © Capgemini 2016. All Rights Reserved.

|IDEM ES

IdemIdem

Page 20: Capgemini Leap Data Transformation Framework with Cloudera

20© Cloudera, Inc. All rights reserved.

Copyright © Capgemini 2016. All Rights Reserved.

|Virtu – Data testing Framework

Business Problem

� Testing data migrations – and in general integrity of data

transformations in large scale BI/DW estates is complicated

� Thousands of objects moved across during the migration –

and when in production loaded every day might lead to

hundred of defects – without an automated system to keep

track of all of them can become a daunting task

� Continuously monitoring of the DQ performance and

regression error history is essential to maintain acceptable

levels of quality

Solution

Benefits

• Customer can easily plan and execute a large amount of checks – completely controlling their lifecycle (creation, modification,

decommissioning)

• Configurable engine to store details of defects to have maximum visibility and transparency on errors and their resolutions

• Native connection to modern defect management systems (Jira) – and easily expandable to any systems with reachable API

• DQ dashboard gives real time and drillable information on current DQ state

• Compatible with 3 system types – Oracle, Impala & MySQL

� A complete e2e testing framework that accelerates the

configuration, execution and evaluation of tests for large scale BI

domains

� Comprised of Web UI for maximum user friendliness in

configuration

� Scheduler engine to launch configurable batches of tests

� Real time Defect manager for timely defects issuing and

progress check

� DQ dashboard for monitoring state and progress

Page 21: Capgemini Leap Data Transformation Framework with Cloudera

21© Cloudera, Inc. All rights reserved.

Copyright © Capgemini 2016. All Rights Reserved.

|Virtu – Testing Framework

Page 22: Capgemini Leap Data Transformation Framework with Cloudera

22© Cloudera, Inc. All rights reserved.

Copyright © Capgemini 2016. All Rights Reserved.

|Virtu – Testing Framework

Page 23: Capgemini Leap Data Transformation Framework with Cloudera

23© Cloudera, Inc. All rights reserved.

Copyright © Capgemini 2016. All Rights Reserved.

|

Adoption

Page 24: Capgemini Leap Data Transformation Framework with Cloudera

24© Cloudera, Inc. All rights reserved.

Copyright © Capgemini 2016. All Rights Reserved.

|Leap Data Transformation Framework is the result of a client co-innovation process and delivered efficiencies on large projects

� Capgemini client in Public Sector is building a Business Data Lake (BDL) to

support all digital channels interactions as well as rationalize/optimize its IT

Business Intelligence legacy landscape on top of the new Big Data architecture

� In the scope of the IT Rationalization project, 10+ data warehouses, hundreds of

analytical business services, and thousands of BO reports must be moved on top

of the BDL, for thousands of business users throughout the organization.

� In this context, Leap Data Transformation Framework was used on a 1st business

scope

� Leap is a framework consisting of a transformation methodology and

accelerators across the transformation lifecycle which can operate at scale:

� The methodology is modular and covering all phases of transformations

� Elements of the Discovery phase were automated

� Design and Build process automation (metadata driven) and application

deployment controls delivered development efficiencies and scalability

� A metadata driven test automation framework reduced initial test effort

and subsequent regression test activities

� A Continuous Development process

� Platform application stack deployment efficiencies

ApproachApproach Key OutcomesKey Outcomes

Accelerator ResultsAccelerator Results

An end to end, fact-based transformation framework to deliver IT Rationalization on top of Big Data ar chitectures

� 40% reduction of the transformation

scope

DiligentDiligent

� 40% reduction of the transformation

scope

Diligent

� 15% efficiency in the design/build

process through use of:

• Semi-Automated ETL code optimizer

• Semi-Automated SQL optimizer

• Semi-Automated report optimizer

Idem Papillon BlendIdem Papillon Blend

� 15% efficiency in the design/build

process through use of:

• Semi-Automated ETL code optimizer

• Semi-Automated SQL optimizer

• Semi-Automated report optimizer

Idem Papillon Blend

� 10% efficiency in the test development

process (1st pass) & 30% efficiency in

regression testing through:

• Automated test & assurance

framework

VirtuVirtu

� 10% efficiency in the test development

process (1st pass) & 30% efficiency in

regression testing through:

• Automated test & assurance

framework

Virtu

Page 25: Capgemini Leap Data Transformation Framework with Cloudera

25© Cloudera, Inc. All rights reserved.

Copyright © Capgemini 2016. All Rights Reserved.

|Use cases for Capgemini’s Leap Data TransformationFramework for optimized business data lakes

� For advanced clients embracing the potential of modern

architectures

� Opportunity to transform, simplify and rationalize an

organization’s data landscape for optimized TCO

� Leap Data Transformation full suite enables risk and cost

reduction working well in an agile approach

� For advanced clients embracing the potential of modern

architectures

� Opportunity to transform, simplify and rationalize an

organization’s data landscape for optimized TCO

� Leap Data Transformation full suite enables risk and cost

reduction working well in an agile approach

ReplatformingReplatforming

� For clients in need of better visibility of their current data

assets before moving to Big Data

� Leap Data Transformation Framework can help optimize

current data management processes, reduce substantially

transformation scope, identify the optimal platform for

the workloads and shape a future project for success

� For clients in need of better visibility of their current data

assets before moving to Big Data

� Leap Data Transformation Framework can help optimize

current data management processes, reduce substantially

transformation scope, identify the optimal platform for

the workloads and shape a future project for success

Legacy Discovery/DW optimizationLegacy Discovery/DW optimization

� Capgemini takes over current BI estate and modernizes it

through its NextGen BISC approach

� For clients with redundant and expensive DW estates

concerned about risks to move to modern architectures

� Leap Data Transformation Framework full suite is a key

element to optimize the TCO and ensuring quality in the

transformation process

� Capgemini takes over current BI estate and modernizes it

through its NextGen BISC approach

� For clients with redundant and expensive DW estates

concerned about risks to move to modern architectures

� Leap Data Transformation Framework full suite is a key

element to optimize the TCO and ensuring quality in the

transformation process

Managing existing BI &

move to modern architectures

Managing existing BI &

move to modern architectures

� For clients needing to automate their data testing in big

data environments or large relational environments

� Tools can automate the testing lifecycle for both big data

and traditional relational DW estates

� For clients needing to automate their data testing in big

data environments or large relational environments

� Tools can automate the testing lifecycle for both big data

and traditional relational DW estates

TestingTesting

Page 26: Capgemini Leap Data Transformation Framework with Cloudera

26© Cloudera, Inc. All rights reserved.

Copyright © Capgemini 2016. All Rights Reserved.

|Replatforming legacy BI applications requires strong strategiesfor user adoption and decommissioning

Strong user adoption strategy

� End users understand the new value

they will get out of the new system

� They are empowered to use it

� Their success is spreading to new

initiatives

• They forget all about the old & slow

stuff fairly quickly

Weak user adoption strategy

� End users fear the new system will

impact their capacity to do their jobs

� The known is safer than the new

� First tests on the new systems

disappoint, any failure goes viral

� Evolutions still run on the old system,

“just in case”

Strong kill strategy

� Systems are killed according to

roadmap, costs linked to unused HW

& SW are recovered

� IT & Business impacts are

anticipated, managed and

communicated

� The energy is focused on the new

Weak kill strategy

� First systems are shut down ignoring

business constraints, impacting

operations

� Endless hours spent to compare the

old and the new and explain

differences

� Unprepared board escalations when

unplanned impacts arise

THE USER

ADOPTION

STRATEGY

THE KILL

STRATEGY

Page 27: Capgemini Leap Data Transformation Framework with Cloudera

27© Cloudera, Inc. All rights reserved.

Copyright © Capgemini 2016. All Rights Reserved.

|Sample Table of contents for the output of a 4 week Data Warehouse Optimization roadmap based on LEAP

� Data Extract & Staging

� Data Management & EDW

� Semantic Layer

� Sandbox & Analytics

� Operational Analytics

� Data Virtualization Layer

� Master Data Management

� Metadata Management

� Data Distribution Layer

� Our Understanding

� Big Data Trends in Heavy Equipment /farm Industry

� Technology Principles

� Reference Architecture

– Conceptual Architecture

– Architecture Components

� Technology Choice Points

– ETL tool comparison

– EMR vs. Hadoop

� ETL & Data Offloading Plan

– Project Structure, Sequence, Sprints

– Assumptions

– Collaborative Planning & Prep

� Logical Architecture

� Business Value Proposition

� Current State Architecture

� End State Architecture

� Current State + 6 months Architecture

� Current State + 12 months

Architecture

� Current State + 18 months

Architecture

� Data Distribution Layer

Page 28: Capgemini Leap Data Transformation Framework with Cloudera

28© Cloudera, Inc. All rights reserved.

Copyright © Capgemini 2016. All Rights Reserved.

|

What’s next?

Page 29: Capgemini Leap Data Transformation Framework with Cloudera

29© Cloudera, Inc. All rights reserved.

Copyright © Capgemini 2016. All Rights Reserved.

|Contact our experts

Schedule a discovery session with our

experts

Schedule a discovery session with our

experts

Schedule a first assessment of the value of

Leap for your organization

Schedule a first assessment of the value of

Leap for your organization

Goutham Belliappa

[email protected]

https://www.linkedin.com/in/gouthambelliappa

Andrea CAPODICASA

[email protected]

Duane Garrett

[email protected]