scott lee data principal emc² global professional … of concept and context; discussion of...

27
Exposition of concept and context; discussion of repercussions SCOTT LEE Data Principal EMC² Global Professional Services +1-312-497-8853 | [email protected] DAMA Chicago 15 april 2015

Upload: lediep

Post on 22-Mar-2018

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: SCOTT LEE Data Principal EMC² Global Professional … of concept and context; discussion of repercussions SCOTT LEE Data Principal EMC² Global Professional Services +1-312-497-8853

Exposition of concept and context; discussion of repercussions

 SCOTT LEE  Data Principal  EMC² Global Professional Services  +1-312-497-8853 | [email protected]

DAMA Chicago 15 april 2015

Page 2: SCOTT LEE Data Principal EMC² Global Professional … of concept and context; discussion of repercussions SCOTT LEE Data Principal EMC² Global Professional Services +1-312-497-8853

Copyright © 2015 Scott Lee. All rights reserved. 2

Introductions, assumptions Problem statement Background & definitions Contrasting Data-aaS and EDW

– Delivery models – Processes – Actors / motivations

What this means for you Q&A

Page 3: SCOTT LEE Data Principal EMC² Global Professional … of concept and context; discussion of repercussions SCOTT LEE Data Principal EMC² Global Professional Services +1-312-497-8853

Copyright © 2015 Scott Lee. All rights reserved. 4

“I want to see all of the data – not just monthly rolling aggregates.”

“I want to understand deep correlations and be able to see what might happen if I change something.”

“I want to surgically target my client actions to produce the most value possible.”

SEE MORE COMPLETELY ANALYZE MORE DEEPLY ACT MORE PRECISELY

Page 4: SCOTT LEE Data Principal EMC² Global Professional … of concept and context; discussion of repercussions SCOTT LEE Data Principal EMC² Global Professional Services +1-312-497-8853

OPEN • Connect, leverage, provision • Exposed data catalogue

SEMANTIC • Conceptual granularity • Emphasizes meaning, not model

CURIOUS • Data discovery, on-boarding • Automated stewardship tasking

MAGNETIC • Simple walk-up data provisioning • Easy ingest attracts best data

AGILE • Embracing constant change • Avoid brittle infrastructure

Copyright © 2015 Scott Lee. All rights reserved. 5

Page 5: SCOTT LEE Data Principal EMC² Global Professional … of concept and context; discussion of repercussions SCOTT LEE Data Principal EMC² Global Professional Services +1-312-497-8853

Scale

Exponential data growth cannot be matched with linear personnel growth; the gap must be addressed

Complexity

External data, metadata, unstructured data, and ever increasing use cases and contexts

Increasing Expectations

Executives and LOB information workers becoming more savvy, more demanding

BI and DW patterns are Artisanal; Big Data needs to be Industrial…

Copyright © 2015 Scott Lee. All rights reserved. 6

Page 6: SCOTT LEE Data Principal EMC² Global Professional … of concept and context; discussion of repercussions SCOTT LEE Data Principal EMC² Global Professional Services +1-312-497-8853

Analytic Modeling

Data Warehouse

Information Research & Development

Stream & Transaction Processing

accuracy / veracity 50%

(best guess) 80% 100%

(no error tolerance) 99% 90% 95%

spee

d /

laten

cy

1 day

1 hour

100 s

0.1 sec

1 ms

1 μs

10K s

100K s

1M s

10M s

100M s

1 sec

1 minute

1 week

1 month

1 year

Continuous Event Processing

Operational Reporting

Executive Dashboard

Financial Reporting Strategic Planning

Long-term Trending

Business Activity Monitoring

Copyright © 2015 Scott Lee. All rights reserved. 7

I would often rather have a directionally correct answer in five minutes instead of a

guaranteed correct answer six months.

Key Observation

Page 7: SCOTT LEE Data Principal EMC² Global Professional … of concept and context; discussion of repercussions SCOTT LEE Data Principal EMC² Global Professional Services +1-312-497-8853

Copyright © 2015 Scott Lee. All rights reserved. 8

Rigid model makes change complex Lost context and business meaning for

sophisticated analytics Integration of unstructured data Time to move large data volumes High cost of redundant infrastructure Expensive infrastructure and software

Page 8: SCOTT LEE Data Principal EMC² Global Professional … of concept and context; discussion of repercussions SCOTT LEE Data Principal EMC² Global Professional Services +1-312-497-8853

A self-contained data platform; provided on-demand; bundling data and software for access and interpretation in a single package

Ease: Simplicity of data access into a single model without requiring knowledge of underlying data objects, integration, etc.

Agility: Immediate access yield accelerated prototyping time and faster solution-time-to-market

Cost-effectiveness: Offsets cost of managing and housing complex data sets separately / redundantly

Quality: Single point of update, collaborative data management from business & IT

Other data

Data Adapter

Apps

Analytics

Models

Dat

a Co

ntai

ner

Users

Copyright © 2015 Scott Lee. All rights reserved. 9

Page 9: SCOTT LEE Data Principal EMC² Global Professional … of concept and context; discussion of repercussions SCOTT LEE Data Principal EMC² Global Professional Services +1-312-497-8853

In DaaS, the unit of service is leased access rights for a specific Data Asset

Copyright © 2015 Scott Lee. All rights reserved. 10

Traceability through data lineage across all data

processing steps and stages

Profiled and baselined data quality, by field;

vetted actively by knowledgeable librarian

Contextualized and presented with metadata

sufficient for business understanding, leverage

Searchable: indexed upon ingestion, maintained in business data catalog; as easy to find as any retail product on Amazon.com

Secured against all inappropriate and wrongful access through trust policies, masking, encryption, ACLs, and audit

Image credit: Knowledgent infographics

Page 10: SCOTT LEE Data Principal EMC² Global Professional … of concept and context; discussion of repercussions SCOTT LEE Data Principal EMC² Global Professional Services +1-312-497-8853

Copyright © 2015 Scott Lee. All rights reserved. 11

Req’s & design

Specify data required in Report

Document timing, delivery, lag requirements

Map fields to DW data model

Identify data gaps

Build solution

Back-trace gaps to data sources

Extend DW data model to house new elements

Build ETL to load sources into DW

Create report showing required data using SQL

Deliver BI

report

Research

Login to Data Catalog

Search or browse for relevant Data Assets

Initiate Lease for selected Data Assets

Provision leased data to private Analytic Sandbox

Explore data

Use any tools to access data

Navigate metadata schema details

Link data together, filter, aggregate

Develop analytical models

Find answer; iterate

Traditional BI / DW 3-6+ months

2 months … 2 minutes Data as-a-Service (DaaS)

Page 11: SCOTT LEE Data Principal EMC² Global Professional … of concept and context; discussion of repercussions SCOTT LEE Data Principal EMC² Global Professional Services +1-312-497-8853

Copyright © 2015 Scott Lee. All rights reserved. 12

Ingest Discovery

Classification

Minimum Viable Product

Curate Roles & Governance

Metadata Stewardship

Asset Enhancement

Consume Data Catalog / Info

Architecture

Provisioning & Lease

Sandbox / Tools Management

Page 12: SCOTT LEE Data Principal EMC² Global Professional … of concept and context; discussion of repercussions SCOTT LEE Data Principal EMC² Global Professional Services +1-312-497-8853

Apply standards; ensure Governance readiness; measure , assess quality

EXTE

RNAL

How data assets are found and nominated for inclusion in the Data Catalog; Consumption should drive Discovery-Ingest prioritization

Copyright © 2015 Scott Lee. All rights reserved. 13

FILE

TABLE

XML

a. Structure b. Context c. Semantics d. Contents DATA LAKE

CATALOG

METADATA

ASSET . . . METADATA

ASSET

Metadata &

DQ

NETWORK Search across network for likely data containers

Discovery Engine

LIBRARIA

N / OPERATO

R

entry entry

entry

Page 13: SCOTT LEE Data Principal EMC² Global Professional … of concept and context; discussion of repercussions SCOTT LEE Data Principal EMC² Global Professional Services +1-312-497-8853

Bringing raw data into the Catalog; refinement into Data Assets; metadata standards, data enhancement, and “just enough” data governance

Copyright © 2015 Scott Lee. All rights reserved. 14

read-copy decompose parse ingest steward

DATA LAKE

CATALOG

enrich • profile • link • quality • semantics folio

METADATA

data object

data instance (row)

data

ele

men

t (co

lum

n)

# records size (MB)

field name description data type length precision classifications …

publish

METADATA

DATA ASSET

METADATA

DATA ASSET

DATA

FILE

TABLE

XML catalog librarian

data custodian

data request asset curation

entry

Page 14: SCOTT LEE Data Principal EMC² Global Professional … of concept and context; discussion of repercussions SCOTT LEE Data Principal EMC² Global Professional Services +1-312-497-8853

User walk-up to Catalog, choose Data Asset (Fields), request Data Lease from Owner / Steward, Lease approval, Provision (copy or federated access) into generated Sandbox

Copyright © 2015 Scott Lee. All rights reserved. 15

DATA LAKE

CATALOG

. . .

SANDBOX

DATA LEASE

LIFECYCLE MANAGEMENT

• Access expires on 5/1 • Max 5 inquiries per hour • Max 3GB transfer per day • Only use for purposes of internal non-

collaborative research • Combining >2 PII fields disallowed

METADATA

DATA ASSET

METADATA

DATA ASSET

DATA

DATA

grant

publish

revoke steward

delist

asset owner

search browse

lease access

choose

data seeker

Page 15: SCOTT LEE Data Principal EMC² Global Professional … of concept and context; discussion of repercussions SCOTT LEE Data Principal EMC² Global Professional Services +1-312-497-8853

High

Future

Comparing Business Intelligence with advanced / predictive analytics (Data Science)

Predictive Analytics and Data Mining (Data Science) Typical

Techniques and Data Types

• Optimization, Predictive modeling, forecasting statistical analysis • Structured/unstructured data, any types of sources, very large data sets

Common Questions

• What if…? • What’s the optimal scenario for our business? • What will happen next? What if these trends continue? Why is this

happening?

Business Intelligence

Typical Techniques and

Data Types

• Standard and ad hoc reporting, dashboards, alerts, queries, details on demand

• Structured data, traditional sources, manageable data sets

Common Questions

• What happened last quarter? • How many did we sell? • Where is the problem? In which situations?

Business Intelligence

BUSINESS VALUE

TIME

Low

Past

Data Science

Copyright © 2015 Scott Lee. All rights reserved. 16

Page 16: SCOTT LEE Data Principal EMC² Global Professional … of concept and context; discussion of repercussions SCOTT LEE Data Principal EMC² Global Professional Services +1-312-497-8853

Robert X. Role: Sales Analyst Tools: Desktop (Primary), mobile Goals Behaviors

“My focus is getting reliable and relevant data fast .. People depend on me to deliver accurate reports showing how our planned strategies will increase revenue.”

• Well-versed in data, expert in business intelligence and analytic tools

• Knows how to search, retrieve and assemble data in many forms

• Recently hired into his role at Blue Data Technologies

• Strong knowledge of the technology industry

• Studied sales and marketing, worked as a sales associate at a software vendor prior to Blue Data

• Some familiarity with querying and languages to facilitate data discovery

• Looking to move into a more senior role of advanced modeling and predictive analytics

• Working on his first project to get vendor data for a vendor credit project

General • I am a thought leader and analyst. • I support the business planning functions

Function and Role • Spends a considerable amount of time

retrieving & assembling data from different sources

• Performs a range of data blending and preparation tasks to create dashboards and data visualizations

• Use spreadsheets & PowerPoint to provide interpretation & facilitate discussions.

Collaboration / Communication • I keep on top of things by attending

industry conferences, vendor briefings, and the internet

• I communicate with sales management helping them to understand the meaning of data patterns and predict future outcomes

• Spends majority of time with business intelligence and analytic tools

• Leans heavily on internal searches and his department and team to figure out where to find things

• Persistent in tracking down the data assets and sources needed for reporting

Background

Copyright © 2015 Scott Lee. All rights reserved. 17

Page 17: SCOTT LEE Data Principal EMC² Global Professional … of concept and context; discussion of repercussions SCOTT LEE Data Principal EMC² Global Professional Services +1-312-497-8853

Copyright © 2015 Scott Lee. All rights reserved. 18

What if… – Business stakeholders didn’t need to engage IT to update

reports every time source data fields are changed? – Data acquisition costs could be easily calculated and

shared back to LOBs most using the information? – Analysts could quickly explore and mash-up data assets,

then share their results with peers in other BUs?

Page 18: SCOTT LEE Data Principal EMC² Global Professional … of concept and context; discussion of repercussions SCOTT LEE Data Principal EMC² Global Professional Services +1-312-497-8853

from 800 to eighty THOUSAND

Copyright © 2015 Scott Lee. All rights reserved. 19

Page 19: SCOTT LEE Data Principal EMC² Global Professional … of concept and context; discussion of repercussions SCOTT LEE Data Principal EMC² Global Professional Services +1-312-497-8853

Copyright © 2015 Scott Lee. All rights reserved. 20

Page 20: SCOTT LEE Data Principal EMC² Global Professional … of concept and context; discussion of repercussions SCOTT LEE Data Principal EMC² Global Professional Services +1-312-497-8853
Page 21: SCOTT LEE Data Principal EMC² Global Professional … of concept and context; discussion of repercussions SCOTT LEE Data Principal EMC² Global Professional Services +1-312-497-8853

Staging, raw, “landing zone”, metadata

focused

Sandbox: “place to do exploratory analytics”

Analytic Sandbox

• Exploratory, ad hoc • Unpredictable loads • Experimental, iterative • Loosely governed • Bring your

own tools

ERP

CRM

EXTERNAL

ETL

• Production • Predictable load • SLA-driven • Heavily governed • Standard tools

MDM, DQ

BI: “place to do at-scale data delivery”

EDW

Copyright © 2015 Scott Lee. All rights reserved. 22

Data Lake

Page 22: SCOTT LEE Data Principal EMC² Global Professional … of concept and context; discussion of repercussions SCOTT LEE Data Principal EMC² Global Professional Services +1-312-497-8853

Data Shopping

Cart UX Prototype

C i h © 2015

2

Page 23: SCOTT LEE Data Principal EMC² Global Professional … of concept and context; discussion of repercussions SCOTT LEE Data Principal EMC² Global Professional Services +1-312-497-8853

Real Time Data Feed Schema Batch

Data Catalog View

ERP Apps Files SaaS Social Media Relational DB Legacy EDW

Data Sources

Data Management Data Virtualization

Transformation

Data Lake

Sandboxes Data Provisioning

Provisioned Schema Query Time Join Data Catalog | Metadata | DG & Stewardship HDFS

Raw Data

Public Cloud Private Cloud Analytic Sandbox Provisioning Process Flow

BI Report Analytics Approve Request

Copyright © 2015 Scott Lee. All rights reserved. 24

Page 24: SCOTT LEE Data Principal EMC² Global Professional … of concept and context; discussion of repercussions SCOTT LEE Data Principal EMC² Global Professional Services +1-312-497-8853

Rabbit MQ

A real DaaS implementation from EMC IT

SDFC Adapter

Global IDs

Attivio

Greenplum

Activity BPM

Create data access request

Approve data access

Provision data access

Firewall

Schema

Queue Data

Queue Data

Schema

External table Provisioned Sandboxes Physical

schema

Data catalog

SFDC

Index

Provisioned Schema

Requestor Approver

Data & schema flow Provisioning work flow

Requestor

Data analytics

Incremental Data Update External Indexes

Provisioning Process Workflow

Data Catalog

Copyright © 2015 Scott Lee. All rights reserved. 25

Page 25: SCOTT LEE Data Principal EMC² Global Professional … of concept and context; discussion of repercussions SCOTT LEE Data Principal EMC² Global Professional Services +1-312-497-8853

…is a 21st century alternative to the traditional Data Warehouse information delivery model.

…is the “killer app” for the Data Lake.™ …is a rich, automated, trustworthy, and just-in-time mechanism to quickly answer business questions.

…drives self-service analytics beyond data scientists to all stakeholders.

Copyright © 2015 Scott Lee. All rights reserved. 26

Page 26: SCOTT LEE Data Principal EMC² Global Professional … of concept and context; discussion of repercussions SCOTT LEE Data Principal EMC² Global Professional Services +1-312-497-8853

Rapid access to data securely enables monetization

Cost-savings over traditional data mart builds Access near real time data: detect trends Better management of data assets Shorter cycle-time to deploy enabler technologies Better utilization of compute resources Charge-back based on utilization

Copyright © 2015 Scott Lee. All rights reserved. 27

Page 27: SCOTT LEE Data Principal EMC² Global Professional … of concept and context; discussion of repercussions SCOTT LEE Data Principal EMC² Global Professional Services +1-312-497-8853

 SCOTT LEE  Data Principal  EMC² Global Professional Services  +1-312-497-8853 | [email protected]

DAMA Chicago 15 april 2015