architecting a big data platform - ibm

36
 Architecting A Big Data Platform for Analytics    W    H    I    T    E     P    A    P    E    R  INTELLIGENT BUSINESS STRATEGIES

Upload: dsilbe01

Post on 03-Jun-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 1/36

 

Architecting A Big Data Platform forAnalytics

   W   H

   I   T   E

    P   A

   P   E   R 

INTELLIGENT

BUSINESSSTRATEGIES

Page 2: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 2/36

 

Table of Contents

!"#$%&'(#)%" +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ , 

-'.)"/.. 0/12"& 3% 4"256./ 7/8 02#2 9%'$(/.++++++++++++++++++++++++++++++++++++++++++++++++++ ,

3:/ ;$%8#: )" <%$=5%2& >%1?5/@)#6 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ A 

3:/ ;$%8#: !" 02#2 >%1?5/@)#6 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ A 

B2$)/#6 %C 02#2 36?/. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ A 

02#2 B%5'1/ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ A 

B/5%()#6 %C 02#2 ;/"/$2#)%" ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ A 

3:/ ;$%8#: !" 4"256#)(25 >%1?5/@)#6 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ A 

<:2# ). -)D 02#2E +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ F 

36?/. %C -)D 02#2 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ F 

<:6 4"256./ -)D 02#2E ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ G 

-)D 02#2 4"256#)( 4??5)(2#)%". ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ G 

-)D 02#2 4"256#)(25 <%$=5%2&. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ HI 4"256.)"D 02#2 !" J%#)%" K%$ L?/$2#)%"25 0/().)%". +++++++++++++++++++++++ +++++++++++++++++++ HI 

M@?5%$2#%$6 4"256.). %C N"OJ%&/55/& J'5#)O9#$'(#'$/& 02#2 +++++++++++ +++++++++++++++++++ HH 

>%1?5/@ 4"256.). %C 9#$'(#'$/& 02#2 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ HP 

3:/ 9#%$2D/Q R/O?$%(/..)"D 2"& S'/$6)"D %C 4$(:)T/& 02#2 +++++++++++++++++++++++++++++++ HU 

4((/5/$2#)"D M3V W$%(/..)"D %C 9#$'(#'$/& 2"& N"O1%&/5/& 02#2 ++++++++++++++++ ++++++ HU 

3/(:"%5%D6 L?#)%". C%$ M"&O#%OM"& -)D 02#2 4"256#)(. +++++++++++++++++++++ ++++++++++++++++++++++++++++ HA 

MT/"# 9#$/21 W$%(/..)"D 9%C#82$/ K%$ -)D 02#2O!"OJ%#)%" +++++++++++++++++++++++++++++++ + HA 

9#%$2D/ L?#)%". C%$ 4"256#)(. L" -)D 02#2 4# R/.# +++++++++++++++++++++++++++++++++++++++++++++ HX 

4"256#)(25 R0-J9. 4??5)2"(/. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ HX 

Y2&%%? 9%5'#)%". ++++++++++++ ++++++++++++++++++++++++++++++++ +++++++++++++++++++++++ +++++++++++++++++++ HX 7%9SV 0-J9. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ HF 

<:)(: 9#%$2D/ L?#)%" !. -/.#E ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ HF 

9(252Z5/ 02#2 J2"2D/1/"# L?#)%". K%$ -)D 02#2 2# R/.# ++++++++++++++++++++ +++++++++++++++ HG 

L?#)%". C%$ 4"256.)"D -)D 02#2 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ H[ 

!"#/D$2#)"D -)D 02#2 !"#% \%'$ 3$2&)#)%"25 0<]-! M"T)$%"1/"# ++++++++++++ ++++++++++++++++++++++++ PH 

3:/ 7/8 M"#/$?$)./ 4"256#)(25 M(%.6.#/1 ++++++++++++++++++++ ++++++++++++++++++++++++++++++++ ++++++ PH 

^%)"/& N? 4"256#)(25 W$%(/..)"D _3:/ W%8/$ %C <%$=C5%8 ++++++++++++++++++++++++++++++++ + PP 

3/(:"%5%D6 R/`')$/1/"#. C%$ #:/ 7/8 4"256#)(25 M(%.6.#/1 +++++++++++++++++++++++++++ + PU 

;/##)"D .#2$#/&a 4" M"#/$?$)./ 9#$2#/D6 K%$ -)D 02#2 4"256#)(. +++++++++++++++++++++++++++++++ ++++++ PA -'.)"/.. 45)D"1/"# ++++++++++++++++++++++++++++++++ ++++++++++++++++++++++ ++++++++++++++++++++++++++++++++ ++++++ PA 

<%$=5%2& 45)D"1/"# <)#: 4"256#)(25 W52#C%$1 +++++++++++++++++++++++ ++++++++++++++++++++++++++++ PA 

9=)55 9/#. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ PA 

>$/2#/ 4" M"T)$%"1/"# K%$ 02#2 9()/"(/ 4"& M@?5%$2#)%" ++++++++++++++++++++++++++++++++ + PX 

0/C)"/ 4"256#)(25 W2##/$". 2"& <%$=C5%8. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ PX 

!"#/D$2#/ 3/(:"%5%D6 #% 3$2".)#)%" #% #:/ -)D 02#2 M"#/$?$)./ +++++++++++++++++++++ ++++++ PX 

Page 3: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 3/36

 

B/"&%$ M@21?5/a !-Jb. M"&O#%OM"& 9%5'#)%" C%$ -)D 02#2 ++++++++++++++++++++++++++++++++++++++++++++ PF 

!-J !"C%9?:/$/ 9#$/21. _ 4"256.)"D -)D 02#2 !" J%#)%" +++++++++++++++++++++++++ ++++++++++ PG 

!-J 4??5)2"(/. C%$ 4"256.)"D 02#2 4# R/.#+++++++++++++++++++++++++++++++++++++++++++++++++++++++++ P[ 

!-J !"C%9?:/$/ -)D!".)D:#. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ P[ 

!-J W'$/02#2 96.#/1 C%$ 4"256#)(. c?%8/$/& Z6 7/#/dd2 #/(:"%5%D6e ++++++ UI 

!-J W'$/02#2 96.#/1 C%$ L?/$2#)%"25 4"256#)(. +++++++++++++++++++++ +++++++++++++++++++ UI 

!-J -)D 02#2 W52#C%$1 4((/5/$2#%$. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ UH 

!-J 0-P 4"256#)( 4((/5/$2#%$ c!044e+++++++++++++++++++++++++++++++++++++++++++++++++++++++++ UH 

!-J !"C%$12#)%" J2"2D/1/"# C%$ #:/ -)D 02#2 M"#/$?$)./ +++++++++++++++++++++++++++++++ + UH 

!-J 4"256#)(25 3%%5. K%$ 3:/ -)D 02#2 M"#/$?$)./ ++++++++++++++++++++++++++++++++ +++++++++++++++ UP 

!-J -)D9://#. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ UP 

!-J >%D"%. HI ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ UP 

!-J >%D"%. >%".'1/$ !".)D:# c>>!e ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ UU 

!-J 9W99 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ UU 

!-J B)T).)1% +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ UU 

Y%8 3:/6 K)# 3%D/#:/$ K%$ M"&O#%O/"& -'.)"/.. !".)D:#+++++++++++++++++++++++++++++++++++++ U, 

>%"(5'.)%" +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ UA 

Page 4: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 4/36

 

INTRODUCTION 

traditional  

BUSINESS DEMAND TO ANALYSE NEW DATA SOURCES

 

Organisations havebeen building datawarehouse for manyyears to analysebusiness activity

The BI market ismature but BI stillremains at theforefront of ITinvestment

New more complexdata has emergedand is beinggenerated at ratesnever seen before

Social network data,web logs, archiveddata and sensordata are all new datasources of attractinganalytical attention

Page 5: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 5/36

 

THE GROWTH IN WORKLOAD COMPLEXITY

THE GROWTH IN DATA COMPLEXITY 

 

 

 

   

Variety of Data Types

 

 

 

 

Data Volume

Velocity of Data Generation

THE GROWTH IN ANALYTICAL COMPLEXITY 

Data and analyticalworkload complexityis growing

New types of dataare being captured

Much of this data isun-modelled

Invenstigativeanalysis is needed to

determine itsstructure before itcan be brought intoa data warehouse

Some new sourcesof data are also verylarge in volume

The rate at whichdata is being createdis also increasing

Analytical complexistyis also growing

Page 6: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 6/36

 

 

 

 

   

 

is now a process  

 

Several types ofanalysis may beneeded to solvebusiness problems

In many cases,determining theinsight needed isnow a processinvolving multipletypes of analyses

Not all analyses inan analytical processcan always be doneon a single platform

Page 7: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 7/36

 

WHAT IS BIG DATA? 

multiple   inaddition to   and also together  

Big Data is therefore a term associated with the new types of workloads andunderlying technologies needed to solve business problems that we could notpreviously support due to technology limitations, prohibitive cost or both.

not   both  

include   and   thedata warehouse is an integral part of the extended analytical environment. 

 

 

TYPES OF BIG DATA

The spectrum ofanalytical workloadsis now so broad thatit cannot all be dealtwith in a singleenterprise datawarehouse

A new extendedanalyticalenvironment is nowneeded

Big Data is a termassociated with newtypes of workloads thatcannot be easilysupported in traditionalenvironments

Big Data is thereforeNOT just about datavolumes

Big Data can beassociated with bothstructured and multi- structured data

The data warehouseis an intregal part ofthe extendedanalytical environment

Analyticalrequirements and datacharacteristics willdictate the technologydeployed

Page 8: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 8/36

 

 

WHY ANALYSE BIG DATA?

 

 

 

BIG DATA ANALYTIC APPLICATIONS

Web logs andsocial networkinteraction data

High volumetransaction data

Sensor data

Text

Companies are nowcollecting data to fendoff future liabilities

The analysis of multi- structured data mayproduce additionalinsight that can enrichwhat a company

already knows

More detail improvesthe accuracy ofbusiness insights andreponsiveness

A shortage of skillsand market confusionare inhibiting adoptionof Big Datatechnologies

Page 9: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 9/36

 

     

 

 

 

   

 

   

 

 

 

   

   

   

 

 

 

A broad range ofuse cases exist forbig data analytics

Web siteoptimisation isachieved byanalysing web logs

On-line advertisingrequires analysis ofclickstream whileusers are on-line

Sensors are openingup a whole new rangeof optimisationopportunities

Text analysis isneeded to determinecustomer sentiment

Page 10: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 10/36

 

BIG DATA ANALYTICAL WORKLOADS 

   

 

 

 

     

ANALYSING DATA IN MOTION FOR OPERATIONAL DECISIONS 

as they happen   or are predicted to impact) 

before  

There are a numberof Big data analyticalworkloads thatextend beyond thetraditional datawarehouseenvironment

Event-streamprocessing is aboutautomaticallydetecting, analysingand if necessaryacting on events tokeep the businessoptimised

Thousands of eventscan occur inbusiness operationsthoughout a workingday

People cannot beexpected to spotevery problem

Event streamprocessing requiresanalysis of data to

take place beforedata is storedanywhere

Page 11: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 11/36

 

 

EXPLORATORY ANALYSIS OF UN-MODELLED MULTI-STRUCTURED DATA 

 

   

 

 

 

 

 

multiple  

 

   

   

In some industries thevolume of event datacan be significant

Un-modelled multi- structured data needs tobe explored todetermine what subsetof data is of businessvalue

Reputationmanagement and‘voice of the customerare dominatinganalysis of text

Text can vary in termsof language and format

Quality can also be aproblem

Multi-structured data ishard to analyse

Page 12: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 12/36

 

 

 

 

 

 

 

   

COMPLEX ANALYSIS OF STRUCTURED DATA 

Multiple analyticalpasses may be neededto determine insights

Search based analyticaltools may help with thistype of workload

Content analytics can gobeyond text to analysesound and video

Exploratory analytics ofun-modelled data is aprocess

Data mining is a popularexample of complexanalysis on structureddata

Predictive and statisticalmodels can be built fordeployment in databaseor in real-timeoperations

Some vertical industriesare investing heavily incomplex analysis tomitigate risk

Page 13: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 13/36

 

 

THE STORAGE, RE-PROCESSING AND QUERYING OF ARCHIVED DATA 

 

 

 

 

 

ACCELERATING ETL PROCESSING OF STRUCTURED AND UN-MODELED

DATA

Storing and analysingarchived data in a bigdata environment is nowattacting interest

Compliance, audit, e- discovery and datawarehouse archive areall reasons for wantingto do this

Integration between

multi-structured dataplatforms and datawarehouses may beneeded to process andanalyse data

Page 14: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 14/36

 

 

There is now a need topush analytics down into

ETL processing toautomatically analyseun-modelled data inorder to consume dataof interest more rapidly

Page 15: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 15/36

 

TECHNOLOGY OPTIONS FOR END-TO-END BIG

DATA ANALYTICS 

EVENT STREAM PROCESSING SOFTWARE FOR BIG DATA-IN-MOTION 

 

 

   

 

 

New technologies needto be added totraditional environmentsto support big dataanalytical workloads

Stream processingsoftware supprts real- time analyticalapplications designed tocontinuously optimisebusiness oprtations

The software must copewith high velocity ‘eventstorms’ where eventsarrive out of sequenceat very high rates

Page 16: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 16/36

 

STORAGE OPTIONS FOR ANALYTICS ON BIG DATA AT REST 

  inaddition to   extended  

   

 

 

   

Analytical RDBMSs Appliances

  appliance  

 

 

 

 

 

 

 

   

Hadoop Solutions

 

    

   

 

   

     

There are multiplestorage options forsupporting big dataanalytics on data at rest

Analytical RDBMSappliances arehardware/software

offerings specificallyoptimised for analyticalprocessing

Analytical RDBMSappliances have beencontinually enhancedover the years

The Hadoop stackenables batch analyticapplications to usethousands of computenodes to processpetabytes of datastored in a distributed

file system

Page 17: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 17/36

 

 

 

     

      

      

 

NoSQL DBMSs

Which Storage Option Is Best?

   

 

 

Hive is a datawarehouse system forHadoop that provides amechanism to projectstructure on Hadoopdata

Hive provides aninterface whereby SQLcan be converted intoMap/Reduce programs

Mahout offers a wholelibrary of analytics thatcan exploit the fullpower of a Hadoopcluster

Hadoop is well suited toexploratory analysis ofun-modelled multi- structured data

Mahout analytics can beapplied to Hadoop dataand the results stored inHive

Hive provides aninterface to make data

available to SQLdevelopers and tools

Graph databases areone type of NoSQLdata store particularlywell suited to socialnetwork links analysis

Hadoop is suited toanalysing unmodelleddata or very largevolumes of structureddata in batch

Page 18: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 18/36

 

         

    

   

 

   

 

 

 

    

   

   

   

   

SCALABLE DATA MANAGEMENT OPTIONS FOR BIG DATA AT REST 

 

 

 

 

 

 

 

Analytical RDBMS issuited to complexanalysis of structureddata and for datawarehousing systemsthat do not have heavymixed workloads

It is important to match

the data characteristicsand analytical workloadto the tachnology toselect the best platform

A common suite of toolsfor informationmanagement across allanalytical data stores isdesireable in theextended analyticalenvironment

Page 19: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 19/36

 

 

OPTIONS FOR ANALYSING BIG DATA 

 

 

 

 

 

 

Informationmanagement needs toconsolidate data to loadanalytical data storesAND also move databetween data stores

Informationmanagement suitesneeds to integrate withHadoop, NoSQLDBMSs, datawarehouses, analyticalRDBMSs and MDM

A number of optionsare available toanalyse big data atrest

Custom built map /reduce applications toanalyse data in Hadoop

Pre-built Mahoutanalytics in Hadoop

Pre-built analyticapplications that usemap/reduce in Hadoop

New BI tools thatgenerate map/reduce jobs in Hadoop

Page 20: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 20/36

 

In-database analytics inanalytical RDBMSs

SQL-based BI toolsaccessing Hadoop data

via Hive or accessingRDBMSs

Search based BI toolsand applications thatuse indexes to analysedata in Hadoop and/oranalytical RDBMSs

Page 21: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 21/36

 

INTEGRATING BIG DATA INTO YOUR TRADITIONAL

DW/BI ENVIRONMENT

THE NEW ENTERPRISE ANALYTICAL ECOSYSTEM 

Traditional datawarehouseenvironments need tobe extended to supportbig data analyticalworkloads

Informationmanagement has amajor role in keepingthis environmentintegrated

Page 22: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 22/36

 

 

JOINED UP ANALYTICAL PROCESSING –THE POWER OF WORKFLOW 

multiple  

valuable  

Data vi rtualizationsimplifies access tomultiple analytical datastores

Master data managementprovides consistentmaster data to allanalytical platforms

Information managementworkflows can be turnedinto analytical processesthat operate across theentire analyticalecosystem

This speeds up the rateat which organisations

can consume, analyseand act on data

Powerful new analyticalworkflows can be used toretain customers andsharpen competitive edge

Page 23: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 23/36

 

TECHNOLOGY REQUIREMENTS FOR THE NEW ANALYTICAL ECOSYSTEM 

 

o   

o   

o   

 

 

   

 

 

 

 

   

   

 

 

 

   

Multiple analytical datastores in addition to theenterprise datawarehouse

Integration of informationmanagement tools with allanalytical data stores andevent stream processing

Data virtualization tosimplify access to data

Best fit query optimizationand in-datastore analytics

Deployment of models inmultiple analytical datastores as well as eventstream processing

Analytical workflows withfull decision management

Page 24: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 24/36

 

   

   

   

   

 

 

   

   

 

 

Sandboxes forexploratory analyticsby data scientists

Governance of theentire ecosystemincluding sandboxesand data scientists

New tools andtraditional tools workingside-by-side to solvebusiness problems

Page 25: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 25/36

 

GETTING STARTED: AN ENTERPRISE STRATEGY

FOR BIG DATA ANALYTICS 

BUSINESS ALIGNMENT 

WORKLOAD ALIGNMENT WITH ANALYTICAL PLATFORM

SKILL SETS 

Big data projects needto be aligned tobusines strategy

Identify candidate Bigdata projects andprioritise them basedon business benefit

Match the analyticalworkload with theanalytical platformbest suited for the job

Data Scientists arenew people that needto be recruited

Data Scientists are self- motivated analyticallyinquisitive people with astrong mathmaticalbackground and a thirst

for data

Traditional ETLdevelopers andbusiness analysts needto broaden their skills toembrace big dataplatforms as well as datawarehouses

Page 26: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 26/36

 

CREATE AN ENVIRONMENT FOR DATA SCIENCE AND EXPLORATION 

DEFINE ANALYTICAL PATTERNS AND WORKFLOWS 

INTEGRATE TECHNOLOGY TO TRANSITION TO THE BIG DATA ENTERPRISE 

   

 

 

 

 

 

 

Governed sandboxesare needed by datascientists wishing toconduct investigativeanalysis on big data

Event stream

processing and Hadoopbased analytics areoften upstream fromdata warehouses

Use big data Insightsto enrich data in a datawarehouse

Technologies need to beadded to and integratedwith traditional datawarehouse environmentsto create a newenterprise analyticalexosystem that caters forall analytical workloads

Page 27: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 27/36

 

VENDOR EXAMPLE: IBM’S END-TO-END SOLUTION

FOR BIG DATA

   

   

   

   

   

 

 

   

 

   

 

   

 

IBM provides a range oftechnology componentsfor end-to-end analyticson data in motion anddata at rest

This includes a datawarehouse and a rangeof analytical appliances

Informationmanagement tools togoven and manage data

All of these componentsare included in the IBMBig Data Platform

Three analytical engines

in the IBM Big DataPlatform

Page 28: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 28/36

 

 

IBM INFOSPHERE STREAMS – ANALYSING BIG DATA IN MOTION 

The IBM Big DataPlatform is IBM’s namefor the enterpriseanalytical ecosystem  

IBM InfoSphereStreams offerscontinuous real-time

analysis of data-in- motion  

IBM InfoSphereStreams offers ships

with pre-built toolkitsand connectors toexpedite developmemtof real-time analyticapliactuions  

Page 29: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 29/36

 

IBM APPLIANCES FOR ANALYSING DATA AT REST 

IBM InfoSphere BigInsights

     

 

 

 

 

 

   

 

 

 

   

 

 

IBM InfoSphereStreams can be used tocontinually ingest datainto IBM BigInsightsHadoop system forfurther analysis  

IBM InfoSphereBigInsights is IBM’scommercial distributionof Hadoop

A lot has been done toenhance Hadoop tomake it more robust  

IBM InfoSphereBigInsights cansupport 3 rd  partyHadoop distributionsas well as IBM’s own

Page 30: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 30/36

 

IBM PureData System for Analytics (powered by Netezzatechnology)

 

* With Revolution R Enterprise software from Revolution Analytics

IBM PureData System for Operational Analytics

IBM PureData System for

Analytics is optimised foranvanced analytics onstructured data and forsome data warehouseworkloads

IBM Netezza Analyticsprovides in-databaseanalytics capabilities andit comes free in every IBMNetezza 1000 orPureData System forAnalytics allowing you tocreate and apply complexand sophisticatedanaltyics right inside theappliance. 

IBM PureData System forOperational Analytics is amodular pre-integratedplatform optimized foroperational analytic dataworkloads  

DB2 10 includes aNoSQL Graph store  

Page 31: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 31/36

 

IBM Big Data Platform Accelerators

 

   

   

 

 

 

IBM DB2 Analytic Accelerator (IDAA)

 

IBM INFORMATION MANAGEMENT FOR THE BIG DATA ENTERPRISE 

SmartConsolidation  

IBM Big DataAccelerators aredesigned to speed up

development on theIBM Big Data Platform  

IBM DB2 AnalyticsAccelerator offloadscomplex analyticalqueries from OLTPsystems running DB2mixed workloads onIBM System z  

IBM InfoSphereInformation Server andFoundation Toolsprovide end-to-end datamanagement across alldata stores in the IBMBig Data Platform  

IBM uses InfoSphereBlueprint Director tocreate smart workflowsthat govern datacleansing, dataintegration, data privacyand data movement  

Page 32: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 32/36

 

IBM ANALYTICAL TOOLS FOR THE BIG DATA ENTERPRISE IBM BigSheets

IBM Cognos 10

 

 

 

IBM Cognos Real-time Monitoring 

Data virtualization isalso included in IBM’sInformation Integrationand Governanceplatform  

BigSheets enablesbusiness users toanalyse data inHadoop  

BigSheets can importdata into BigInsightsHadoop from internaland external datasources  

Many Eyes is alsoincluded to improvevisualisation  

IBM has integrated its

Cognos BI tool suite withBigInsights via Hive andalso with IBM Netezzaand IBM Smart AnalyticsSystem now a part of thePureData Systems family  

IBM Cognos RTM cananalyse filtered event

data fed to it byInfoSphere Streams forreal-time exceptionmonitoring  

Page 33: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 33/36

Page 34: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 34/36

 

HOW THEY FIT TOGETHER FOR END-TO-END BUSINESS INSIGHT 

All of these cometogether to extendtraditional datawarehouseenvironments to createan enterprise analyticalecosystem thatprocesses traditionaland big data workloads  

Page 35: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 35/36

 

CONCLUSION 

 

as well as

  and  

 

Business is nowdemanding more

analytical power toanalyse new sources ofstructured and multi- structured data

New technologies haveemerged to supportspecific analyticalworkloads

Traditional datawarehouseenvironments now needto be extended toaccommodate thesenew big data analyticalworkloads

The IBM Big DataPlatform rises to thechallenge to create thisnew analyticalenvironment

The IBM Big DataPlatform makes IBM aserious contender tosupport end-to-endanalytical workloads

Page 36: Architecting a Big Data Platform - IBM

8/12/2019 Architecting a Big Data Platform - IBM

http://slidepdf.com/reader/full/architecting-a-big-data-platform-ibm 36/36

 

About Intelligent Business Strategies

intelligent business 

Author

   

INTELLIGENT

BUSINESSSTRATEGIES

   

Architecting a Big Data Platform for Analytics

©