foundation for success: how big data fits in an information architecture

55
Grab some coffee and enjoy the pre-show banter before the top of the hour!

Upload: inside-analysis

Post on 27-Jun-2015

196 views

Category:

Technology


3 download

DESCRIPTION

BDIA Roundtable Live Webcast on April 9, 2014 Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=c84869fcca958d278b210cfca2a023a0 Big Data can offer big value and big challenges, and there are lots of solutions and promises out there. But in order to harness the most insight from Big Data, organizations need to solve pain points with more than triage. Since data challenges continue to permeate the information landscape, businesses would do well to incorporate solutions that fit into the infrastructure and provide a sustainable method for managing and analyzing Big Data. Register for this Roundtable Webcast to hear veteran Analysts Robin Bloor, Mike Ferguson and Richard Winter as they offer their perspectives on the evolving Big Data industry. They’ll comment on the proposed Big Data Information Architecture, and take questions from the audience. This is the second event of The Bloor Group's Interactive Research Report for 2014 which will focus on illuminating optimal Big Data Information Architectures. The series will include a dozen interviews with today's Big Data visionaries, plus three interactive Webcasts and a detailed findings report. Visit InsideAnlaysis.com for more information.

TRANSCRIPT

Page 1: Foundation for Success: How Big Data Fits in an Information Architecture

Grab some coffee and enjoy the pre-show banter before the top of the hour!

Page 2: Foundation for Success: How Big Data Fits in an Information Architecture

“The Inevitable Shift: How Big Data Impacts Enterprise Architecture”

RoundTable Webcast | April 9, 2014

Page 3: Foundation for Success: How Big Data Fits in an Information Architecture

Host

Eric Kavanagh CEO, The Bloor Group @eric_kavanagh [email protected]

Page 4: Foundation for Success: How Big Data Fits in an Information Architecture

Findings Webcast June 25, 2014

Big Data Information Architecture

Roundtable Webcast April 9, 2014

Exploratory Webcast January 22, 2014

#BigDataArch

Page 5: Foundation for Success: How Big Data Fits in an Information Architecture

Analysts

Robin Bloor Chief Analyst, The Bloor Group

Richard Winter President & Founder, WinterCorp

Mike Ferguson Managing Director, Intelligent Business Strategies

Page 6: Foundation for Success: How Big Data Fits in an Information Architecture

BIG DATA

Page 7: Foundation for Success: How Big Data Fits in an Information Architecture

Hadoop as the Data Reservoir

Page 8: Foundation for Success: How Big Data Fits in an Information Architecture

Big Data and the Data Reservoir

Page 9: Foundation for Success: How Big Data Fits in an Information Architecture

BDIA: The Story So Far

Robin Bloor, Ph.D.

Page 10: Foundation for Success: How Big Data Fits in an Information Architecture

Big Data – A Poorly Defined Term

WHAT IS BIG DATA?

Business data

Traditional data

Log file data

Operational data

Mobile data

Location data Social

network data

Public data

Commercial databases

Streaming data

Internet of Things

Page 11: Foundation for Success: How Big Data Fits in an Information Architecture

A TRANSACTION is a MOLECULE of ATOMIC

EVENTS

The ATOM of data has become the EVENT

Atoms and Molecules

Page 12: Foundation for Success: How Big Data Fits in an Information Architecture

The Traffic Cop (Events)

Page 13: Foundation for Success: How Big Data Fits in an Information Architecture

Atoms and Molecules

DATA FLOW is becoming a driving factor

This suggests the need for a

DATA RESERVOIR

Page 14: Foundation for Success: How Big Data Fits in an Information Architecture

Hadoop as the Data Reservoir

Page 15: Foundation for Success: How Big Data Fits in an Information Architecture

Big Data and the Data Reservoir

Page 16: Foundation for Success: How Big Data Fits in an Information Architecture

The Workload Paradigm Shift

u  Previously, we viewed database workloads as an i/o optimization problem

u With analytics the workload is a very variable mix of i/o and calculation

u No databases were built precisely for this – not even Big Data databases

Page 17: Foundation for Success: How Big Data Fits in an Information Architecture

The Big Data Applications

It’s pretty much all about

BI & ANALYTICS

Page 18: Foundation for Success: How Big Data Fits in an Information Architecture

The Biological System

u Our human control system works at different speeds: •  Almost instant reflex •  Swift response •  Considered response

u Organizations will gradually implement similar control systems

u This suggests a data-flow- based architecture

u The EDW is memory

Page 19: Foundation for Success: How Big Data Fits in an Information Architecture

The Corporate Biological System

u Right now this division into two different data flows is already occurring

u Currently we can distinguish between: •  Real-time/Business-time

applications •  Analytical applications

u We should build specific architectures for this

Page 20: Foundation for Success: How Big Data Fits in an Information Architecture

W I N T E R C O R P

T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S

Big Data Information Architecture Bloor Group Roundtable

Richard Winter WinterCorp

April 2014

Page 21: Foundation for Success: How Big Data Fits in an Information Architecture

Big Data and the Data Reservoir

From  Robin’s  charts:

Page 22: Foundation for Success: How Big Data Fits in an Information Architecture

©2010 Winter Corporation. All Rights Reserved. 22!W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S

©2010 Winter Corporation. All Rights Reserved.

It’s About the Platforms & Their Roles

•  Data  Warehouse •  Data  Mart •  Data  Refinery •  Data  Landing  Zone •  Data  Discovery •  Graph  Analytics •  Etc.      

©  2012,  2013,  2014    WINTER  CORPORATION,  BELMONT  MA.    ALL  RIGHTS  RESERVED.

Page 23: Foundation for Success: How Big Data Fits in an Information Architecture

©2010 Winter Corporation. All Rights Reserved. 23!W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S

©2010 Winter Corporation. All Rights Reserved.

Data Refining Example Data from Turbines

©  2010,  2011,  2012    WINTER  CORPORATION,  CAMBRIDGE  MA.    ALL  RIGHTS  RESERVED. ©  2012,  2013,  2014    WINTER  CORPORATION,  BELMONT  MA.    ALL  RIGHTS  RESERVED.

Page 24: Foundation for Success: How Big Data Fits in an Information Architecture

©2010 Winter Corporation. All Rights Reserved. 24!W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S

©2010 Winter Corporation. All Rights Reserved.

Data Refining Example Data Management Requirements

1. Hundreds  of  TB  or  more  of  data  per  week  

2.  Raw  data  life:  few  hours  to  a  few  days

3.  Challenge:  find  the  important  events  or  trends  quickly

4. Massive  analysis  problem

5. When  analyzing,    read  entire  files

6.  Keep  only  the  significant  data

©  2012,  2013,  2014    WINTER  CORPORATION,  BELMONT  MA.    ALL  RIGHTS  RESERVED.

Page 25: Foundation for Success: How Big Data Fits in an Information Architecture

©2010 Winter Corporation. All Rights Reserved. 25!W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S

©2010 Winter Corporation. All Rights Reserved.

Business Example Enterprise Data Warehouse

©  2012,  2013,  2014    WINTER  CORPORATION,  BELMONT  MA.    ALL  RIGHTS  RESERVED.

Page 26: Foundation for Success: How Big Data Fits in an Information Architecture

©2010 Winter Corporation. All Rights Reserved. 26!W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S

©2010 Winter Corporation. All Rights Reserved.

Enterprise Data Warehouse Data Management Requirements

1.  Data  volume a.  TB  to  PB  –  all  retained  for  at  least  five  years b.  Continual  growth  of  data  and  workload

2.  Data  sources:  hundreds  to  thousands a.  Data  sources  change  their  feeds  frequently b.  New  data  sources  are  frequent

3.  Challenges a.  Data  must  be  correct b.  Data  must  be  integrated

4.  Typical  enterprise  data  lifetime:  decades 5.  Analytic  application  lifetime:  years 6.  Many  thousands  of  data  users  (104  –  106) 7.  Hundreds  of  analytic  applications 8.  Thousands  of  one  time  analyses 9.  Tens  of  thousands  of  complex  queries

©  2012,  2013,  2014    WINTER  CORPORATION,  BELMONT  MA.    ALL  RIGHTS  RESERVED.

Page 27: Foundation for Success: How Big Data Fits in an Information Architecture

©2010 Winter Corporation. All Rights Reserved. 27!W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S

Some Platform Examples Requirement Platform Data  Refinery Hadoop

Complex  SQL  Query Data  Warehouse Enforce/Manage  Business  Rules Data  Warehouse

Intensive  Batch  Processing Hadoop Simple  Data  Mart Multiple  Options Data  Discovery New  Category Integrated  Data Data  Warehouse

Data  Landing  Zone Hadoop Document  Store Multiple  Options Stream  Processing Multiple  Options

ETL Multiple  Options

©  2012,  2013,  2014    WINTER  CORPORATION,  BELMONT  MA.    ALL  RIGHTS  RESERVED.

Page 28: Foundation for Success: How Big Data Fits in an Information Architecture

©2010 Winter Corporation. All Rights Reserved. 28!W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S

Understand the Platform Cost Tradeoffs

•  Cost  tradeoffs  can  be  surprising  –  platform  cost  is  not  

always  the  driver

•  Requires  a  total  cost  framework  &  systematic  

approach

•  “Big  Data:  What  Does  it  Really  Cost?”

wintercorp.com/tcod-­‐‑report

©  2012,  2013,  2014    WINTER  CORPORATION,  BELMONT  MA.    ALL  RIGHTS  RESERVED.

Page 29: Foundation for Success: How Big Data Fits in an Information Architecture

©2010 Winter Corporation. All Rights Reserved. 29!W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S

Data Platforms A Changing Picture

•  Categories  are  not  seiled

•  Data  Warehouse  has  a  continuing,  major  role

•  Hadoop  has  a  major  role

•  Everything  else  is  in  flux

©  2012,  2013,  2014    WINTER  CORPORATION,  BELMONT  MA.    ALL  RIGHTS  RESERVED.

Page 30: Foundation for Success: How Big Data Fits in an Information Architecture

Big Data Information Architecture

Mike Ferguson Managing Director Intelligent Business Strategies Bloor Group Big Data Roundtable April 2014

Twitter: @mikeferguson1

Page 31: Foundation for Success: How Big Data Fits in an Information Architecture

31

For Many Years The Traditional Data Warehouse and BI Environment Has Been Used For Analysis & Reporting

Operational systems

web

P o r t a l

Employees Partners

Customers

BI Tools

Platform Dat

a In

tegr

atio

n / D

Q

Reports & analytics

Data warehouse & data marts

DW

Page 32: Foundation for Success: How Big Data Fits in an Information Architecture

32

However There Are New Types of Data That Businesses Now Want to Analyse §  Web data

•  Clickstream data, e-commerce logs

•  Social networks data e.g., Twitter

§  Semi-structured data e.g., e-mail, XML, JSON

§  Unstructured content •  How much is TEXT worth to you

§  Sensor data •  Temperature, light, vibration, location,

liquid flow, pressure, RFIDs

§  Vertical industries structured transaction data •  E.g. Telecom call data records, retail Source: Analytics: The Real-World Use of Big Data

Said Business School Oxford and IBM

Page 33: Foundation for Success: How Big Data Fits in an Information Architecture

33

The Impact of Big Data – We Now Have Different Platforms Optimised For Different Analytical Workloads

Streaming data

Hadoop data store

Data Warehouse RDBMS

NoSQL DBMS

EDW

DW & marts

NoSQL DB e.g. graph DB

Advanced Analytic (multi-structured data)

mart DW

Appliance

Advanced Analytics (structured data)

Analytical RDBMS

Big Data workloads now mean we require multiple platforms for analytical processing

C

R

U

D

Prod

Asset

Cust

MDM

Graph analysis

Investigative analysis,

Data refinery

Data mining, model

development

Traditional query,

reporting & analysis

Real-time stream

processing & decision

management

Master data management

Page 34: Foundation for Success: How Big Data Fits in an Information Architecture

34

Hadoop Is A Platform At The Heart of Big Data Analytics – There Are Multiple Ways To Access Hadoop

SQL Java MapReduce APIs to HDFS, HBase, Cascading

file file file file file

file file file file file

file file

file file

Vendor SQL on Hadoop engine

webHDFS (An HTTP interface to HDFS has

REST APIs) HDFS

file

file

index index Index

partition

file

file

MapReduce Hadoop 2.0 F’work

YARN

SQL

PIG latin scripts

MapReduce Application

BI Tools / Apps

Page 35: Foundation for Success: How Big Data Fits in an Information Architecture

35

Popular Hadoop Use Cases

§  Hadoop as a data refinery •  Offloading data integration from a DW

§  Hadoop for investigative analysis in an analytical sandbox

§  Hadoop as an on-line data warehouse archive

Page 36: Foundation for Success: How Big Data Fits in an Information Architecture

36

The Hadoop Data Refinery

EDW

Graph DBMS

DW Appliance

Analytical DBMS

XML,  JSON  

social

Web logs

ERP

CRM

SCM

Ops NoSQL DB

web

Data marts

insi

ghts

ELT processing

cloud

Page 37: Foundation for Success: How Big Data Fits in an Information Architecture

37

A Centralised Hadoop Based Data Refinery is One Way to Scale at Reduced Cost

Data Hub - Consume, Clean, Integrate, Analyse And Provision Data From Hadoop To Any Analytical Platform

Generated MapReduce

ELT jobs

business insight

sandbox

ELT Processing

feeds sensors

!"#$%&'()%

RDBMS Files office docs social Cloud *+,*-./0123%

Web logs web services

NoSQL DB e.g. graph DB EDW

DW & marts

mart

DW Appliance

Advanced Analytics (structured data)

Exploratory analysis

Staging area / landing zone

Sometime analysts refer to this as a Data Refinery

Data Refinery

What is the purpose of the data refinery?

Is it to process un-modelled data or all data?

Page 38: Foundation for Success: How Big Data Fits in an Information Architecture

38

Investigative Analysis Can Be Done In A Hadoop Sandbox

Click stream web log data Customer interaction data

Social interaction data (e.g. Twitter, Facebook)

Sensor data Rich media data (video, audio)

External web content Documents

Internal web content Seismic data (oil & gas)

Investigative / Exploratory Analysis

C

R U

D

Asset Customer

Product

MDM System

EDW mart

new business insight

sandbox

Multi-structured data

Historical Data

archived DW data master data

Data Scientists

Page 39: Foundation for Success: How Big Data Fits in an Information Architecture

39

Streaming Data

Graph Data Multi-Structured

+

Master Data Business Value Created

sentiment Customer sentiment & Product sentiment

Customer online behaviour

Prospects & Influencers

Sensor data Field service optimization Risk mgm’t Asset performance

Joining Big Data With Master Data During Exploratory Analysis Can Produce Insight for Competitive Advantage

customer

product

NoSQL DB e.g. graph DB

C

R

U

D

Master data

customer

customer

asset

Page 40: Foundation for Success: How Big Data Fits in an Information Architecture

40

New Insights Can Be Added Into A DW To Enrich What You Already Know

DW D I

new insights

Operational systems

e.g. Deriving insight from social web sites like for sentiment analytics

sandbox

Data Scientists

social

Web logs

web cloud

Page 41: Foundation for Success: How Big Data Fits in an Information Architecture

41

Alternatively New Insights In Hadoop Can Integrated With A DW Using Data Virtualization To Provide Enriched Information

DW D I

e.g. Deriving insight from social web sites like for sentiment analytics

new insights

OLTP systems

sandbox

Data Scientists

social

Web logs

web cloud

Data Vitualisation

SQL on Hadoop

Page 42: Foundation for Success: How Big Data Fits in an Information Architecture

42

Using Hadoop As A Data Archive Means Data Can Be Kept On-line, Analysed And Still Integrated With Data In The DW

DW D I

new insights

OLTP systems

Data Vitualisation

SQL on Hadoop

Archived data

Archive unused

or data > n years

Page 43: Foundation for Success: How Big Data Fits in an Information Architecture

43

Real-time Data From NoSQL DBMSs Can Also Be Joined To DW Data Using Data Virtualization

DW D I

Nested data like JSON needs to be handled by the data virtualisation server

real-time insights

OLTP systems

social

Web logs

Data Vitualisation

Column Family DB Document DB

NoSQL DB

sensors

Nested data !!

Page 44: Foundation for Success: How Big Data Fits in an Information Architecture

44

Investigative Analysis Can Be Done In A Graph DBMS – New Insight Can Also Come From Graph Analysis

Investigative / Exploratory Analysis

C

R U

D

Asset Customer

Product

MDM System

new business Insight

Structured data

master data

Data Scientists

Multi-structured data

Graph DBMS

Page 45: Foundation for Success: How Big Data Fits in an Information Architecture

45

SQL Access To Big Data - Options

SQL

SQL access to big data in Hadoop

SQL

Analytical RDBMS

SQL access to big data in an

analytical RDBMS

streaming data

SQL

SQL access to streaming data in

motion

SQL access to a combination of the above

SQL

DW

data virtualisation server

SQL access to big data via data

virtualisation

Page 46: Foundation for Success: How Big Data Fits in an Information Architecture

46

SQL on Hadoop Challenges – Multi-structured Data May Need to Be Analysed

{ "firstName": ”Wayne", "lastName": ”Rooney", "age": 25, "address": { "streetAddress": "21 Sir Matt Busby Way", "city": ”Manchester”, “country”: “England”, "postalCode": “M1 6DY” }, "phoneNumbers": [ { "type": "home”, "number": ”0161-123-1234” }, { "type": ”mobile", "number": ”07779-123234” } ] }

JSON data

Text data

Image Data

SQL??

SQL??

SQL??

Page 47: Foundation for Success: How Big Data Fits in an Information Architecture

47

SQL on Hadoop Challenges – Multi-structured Data May Need to Be Analysed

Web log data

Tab delimited file data

SQL??

SQL??

Page 48: Foundation for Success: How Big Data Fits in an Information Architecture

48

Hadoop Storage Is Independent of Any SQL Engine Accessing HDFS - Multiple SQL Engines Can Coexist On The Same Data

§  Key points about Hadoop •  It is possible to have MULTIPLE SQL engines on the same data •  Different SQL engines run on different Hadoop frameworks (M/R, Tez,

Spark) or on no framework at all i.e. directly access HDFS or HBase data

Source: Hortonworks

SQL SQL SQL SQL

Storage is independent of any SQL engine

Page 49: Foundation for Success: How Big Data Fits in an Information Architecture

49

Relational DBMS / Hadoop Integration – Several Vendors Have Integrated RDBMS with Hadoop to Run Analytics

Relational DBMS

External Polymorphic

table function(s)

HDFS / Hbase/ Hive

SQL, XQuery

RDBMS optimizer handles transparent access to external analytical platforms on behalf of the user

CitusDB Exasol EXAPowerlitics IBM PureData System for Analytics and DB2 HDFS clients Oracle HDFS Client Pivotal HAWQ PFX Teradata SQL H

RDBMS and Hadoop could be deployed on the same hardware cluster or on different hardware clusters

Allows join across data in a single RDBMS and Hadoop

Page 50: Foundation for Success: How Big Data Fits in an Information Architecture

50

Self-Service BI

Self-service Data Discovery & Visualisation

or Dashboard Server

Business analyst

Data Virtualization and Optimization

personal & office

data Predictive models

Transaction systems

Data Management Tools (ETL, DQ, etc.)

DW

Self-Service Access To Big Data Via Data Virtualization

BUT what about optimization? Can the data virtualisation server push down analytics to underlying platforms to make them do the work?

Product examples: Cirro, Cisco, Denodo, Informatica Data Services, ScleraDB

Page 51: Foundation for Success: How Big Data Fits in an Information Architecture

51

sandbox Analytical Operational

Conclusions - People In Different Roles In The Analytical Landscape Need to Work Together To Deliver Value

Exploratory analysis Model producer

Business Analyst Business Manager/ Operations Worker

Data Scientist

Model consumer Data discovery & visualisation Information Producer

• Build reports • Build and publish dashboards

Information consumer Decision maker Action taker

Page 52: Foundation for Success: How Big Data Fits in an Information Architecture

52

www.intelligentbusiness.biz [email protected]

Twitter: @mikeferguson1

Tel/Fax (+44)1625 520700

Thank You!

Page 53: Foundation for Success: How Big Data Fits in an Information Architecture

ROUNDTABLE DISCUSSION

Page 54: Foundation for Success: How Big Data Fits in an Information Architecture

Questions?

#BigDataArch or

USE THE Q&A

Page 55: Foundation for Success: How Big Data Fits in an Information Architecture

THANK YOU!

REGISTER FOR BDIA WEBCASTS AT: http://insideanalysis.com/research/big-data-information-architecture

Image on Slide 53 borrowed from http://www.apieceofmonologue.com/2012/08/stanley-kubrick-film-photography-design.html