capturing big value in big data

24
This document is offered compliments of BSP Media Group. www.bspmediagroup.com All rights reserved.

Upload: bsp-media-group

Post on 27-Jan-2015

120 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Capturing big value in big data

This document is offered compliments of BSP Media Group. www.bspmediagroup.com

All rights reserved.

Page 2: Capturing big value in big data

T-Systems | Big Data 14.11.2013 1

HADOOP

Capturing Big Value in Big Data

Page 3: Capturing big value in big data

T-Systems | Big Data 14.11.2013 2

BIG DATA: WHY NOW?

x 2 digital data globally doubles every two years1

90% of all data is unstructured and cannot be handled with traditional analytics tools1

70% of all IT invest 2015 will be Big Data driven2

10-50% cost reduction in production through Big Data exploitation4

85% of Top 500 enterprises will fail to exploit Big Data2

>30% of enterprises have no formal concept for data management5

1 IDC Predictions 2012 2 Gartner, Predicts 2012 3 Wikibon 2012, Big Data Market Size and Vendor Revenues.

4 McKinsey Global Institute 2011, Big data: The next frontier for innovation, competition, and productivity

5 Economist Intelligence Unit 2011, Big data. Harnessing a game-changing asset

Page 4: Capturing big value in big data

T-Systems | Big Data 14.11.2013 3

THE BI ECOSYSTEM ACCORDING TO FORRESTER

Page 5: Capturing big value in big data

T-Systems | Big Data 14.11.2013 4

THE 2012 GARTNER HYPE CYCLE FOR BIG DATA IN-MEMORY ANALYTICS APPROACHING MAINSTREAM ADOPTION

Page 6: Capturing big value in big data

T-Systems | Big Data 14.11.2013 5

POSITIONING HADOOP NOVEMBER 2013 HADOOP APPROACHING MAINSTREAM ADOPTION

Page 7: Capturing big value in big data

T-Systems | Big Data 14.11.2013 6

HADOOP VS IN-MEMORY ANALYTICS

IMA is the Ferrari: Sexy, very fast, but with limited luggage space

Hadoop (with Impala) is a fleet of MPV's: Good performance & capacity, easy to drive, affordable

Hadoop (without Impala) is a fleet of Long Haul trucks: Moderate performance, Excellent Capacity, needs a specialist driver’s license and drives overnight.

How fast do you want your delivery made? What is being delivered? How much do you want to spend?

Do you have specialist drivers?

Some Hadoop Improvements

• With the ecosystem of contributors and distributions, Hadoop becomes easier and easier to use e.g. Cloudera’s Impala, Microsoft’s HDInsight, MapR’s Drill, Hortonworks’ Stinger Initiative

• With Cloudera’s Hadoop offering when you buy the Trucks they throw in the MPV's for free

• Hadoop 2.0 brings YARN, Graph Analysis and Stream Processing

• With the speed of improvements in HDFS/HBase/Hive/Yarn, the gap between batch and real-time/low-latency is going to be cut fairly soon e.g. from Hive 0.10 to 0.11 with the new RCFile data format there is a performance boost >10x

Page 8: Capturing big value in big data

T-Systems | Big Data 14.11.2013 7

HADOOP INNOVATION #1: MUCH CHEAPER STORAGE

7

SAN Storage

$2 - $10/Gigabyte

$1M gets:

0.5Petabytes

200,000 IOPS

8Gbyte/sec

Software by

HDS, bundled with

hardware by HDS

NAS File Servers

$1 - $5/Gigabyte

$1M gets:

1 Petabyte

200,000 IOPS

10Gbyte/sec

Software by

NetApp, bundled with

hardware by NetApp

Local Storage

<$0.50/Gigabyte

$1M gets:

10 Petabytes

400,000 IOPS

250 Gbytes/sec

Software by

open source Hadoop ecosystem,

hardware self-assembled

Page 9: Capturing big value in big data

T-Systems | Big Data 14.11.2013 8

HADOOP INNOVATION #2: STORE FIRST, QUESTIONS LATER

Business Problem

Backward-looking analysis

Using data out of business applications

Quasi-real-time, In-memory analysis Using data out of business applications

Complex Event Processing

Batch, Forward-looking predictive analysis

Questions defined in the moment, using data from many sources

Technology Solution SAP Business Objects

IBM Cognos MicroStrategy

Oracle Exadata SAP HANA

Cloudera Hadoop Hortonworks Hadoop Microsoft Hadoop

Structured Limited (2 – 3 TB in RAM)

Structured Limited (1 PB in RAM)

Structured or unstructured Quasi unlimited (20 – 30 PB)

Legacy BI High performance BI „Hadoop“ Ecosystem

Selected Vendors

Data Type/Scalability

Page 10: Capturing big value in big data

T-Systems | Big Data 14.11.2013 9

GARTNER HYPE CYCLE FOR ANALYTIC APPLICATIONS A GREAT STARTING POINT FOR BI AND BIG DATA USE CASES

Page 11: Capturing big value in big data

T-Systems | Big Data 14.11.2013 10

Implementing HADOOP to generate profit selected Use Cases

Research and analysis of video, audio and online print

Semantic analyses and results visualization

Intelligent News Discovery

Print Queue analysis for Confidential and/or sensitive documents

Email Analysis

Comprehensive monitoring of unlimited data volumes and types

Security Analytics

Transparency across all suppliers and prices

Stronger negotiating position in purchasing

Efficient cashflow management

Smarter Procurement

Analysis of traffic situations

Improved planning and local resident satisfaction

Big Event optimisation

Metro Traffic Diagnostics

Monitoring of marketing campaigns

Consideration of all sources and formats

Efficient campaign management

Campaign Analytics

Driving tips for drivers

Competitive advantage thanks to cost reductions

Lower fuel consumption and CO2 emissions

Better planning of routes and cargo loads

Efficient Fleet Management

Optimized use of resources for all energy sources

Future utilisation forecasts

Feeds into customer-specific pricing

Smarter Energy Management

Page 12: Capturing big value in big data

T-Systems | Big Data 14.11.2013 11

Supply Chain Optimization controlling own and OEM production capacity

Production Optimization using Sensor Data and Machine 2 Machine Communication

Using Online Forums for Product Development & Sentiment Analysis

Social Media Usage for Macro/Micro Trend analysis

Massive Parallel Processing for Drug Testing in Pharma

CERN number crunching for test data (40GB/sec)

Predictive Maintenance & Prediction (Combat unwanted production stops)

Production Planning for Seasonal Goods (multi factor )

Truck transportation optimization (transport order navigational data, combined with traffic data)

Road Charge Optimization (real time adaptation of fees according to current traffic)

Customer Individual Discounts for products on websites and call centers (multi factor, real time)

Financial Simulation and Scenario Calculations

Financial Simulation and Scenario Calculations

Online Fraud Detection (Credit Card transactions, etc.)

Risk Controlling (Market Risk/Value at Risk)

Detection of unknown financial risk (e.g. for real estate loans)

Optimize Target Group Marketing for online banking based on trading/depot transactions

Online Marketing Campaign Optimization

Big Data for Point of Sales Optimization/Cross Selling

Big Data for Point of Sales Optimization/Cross Selling

Competitive Analysis using Online Press, Social Media with Scraping and Text Analysis

Customer Churn Analysis for Prepaid Telco business (behavior based)

HADOOP USE CASES BY BUSINESS FUNCTION

Marketing & Sales Product Development & Research

Product Service & Support

Distribution & Logistics Finance & Controlling

Page 13: Capturing big value in big data

T-Systems | Big Data 14.11.2013 12

WHAT ARE THE PRE-REQUISITES FOR AN EFFECTIVE VALUE DERIVED FROM HADOOP?

Foundation is a Data Strategy

• Map Data to Business Value – which data is required to deliver on a value statement or answer a fundamental business question

• Categorise critical Data vs non-Critical Data – critical data is not only the data identified in the Business Value question above, but is that data that could/should have long-term (potential) value and is typically used across multiple business processes or a value chain. Master Data Management is a key activity here

• Define your Data Ecosystem – not only the technology but the processes, responsibilities matched to roles - and three core capabilities – data, insight and action

• Data Governance

Define the appropriate Data Roles in the organisation

the governance structure must be federated, with a central governing body addressing the most important, common data and most of the data managed locally in the lines of business.

Improve Data Quality

Improve Data Accessibility

Page 14: Capturing big value in big data

T-Systems | Big Data 14.11.2013 13

SOME NEW ROLES IN DATA/ANALYTICS THE COMING OF AGE OF DATA IN THE ENTERPRISE

The Data Scientist

The Chief Data Officer

Data Hygienist/Data Steward

Data Explorer

Business Solution Architect/Domain Expert

Campaign Expert

Data Security Officer

50% Big Data talent gap expected until 20184

4 McKinsey Global Institute 2011, Big data: The

next frontier for innovation, competition, and

productivity

Page 15: Capturing big value in big data

T-Systems | Big Data 14.11.2013 14

MANY ORGANISATIONS RESEMBLE THIS TODAY HOW DOES HADOOP COMPLEMENT EXISTING INVESTMENTS IN BUSINESS INTELLIGENCE?

Existing data sources

Business Intelligence Tools and analytical applications

Transactional OLTP DBMS

Business Applications

ERP, CRM, etc.

Data Warehouse

Data Mart Cube

Appliance

Reporting Dashboard Data Mining

Data integration ETL

OLAP

Page 16: Capturing big value in big data

T-Systems | Big Data 14.11.2013 15

HADOOP COMPLEMENTS EXISTING BI INVESTMENT

Existing data sources

Business Intelligence Tools and analytical applications

Transactional OLTP DBMS

Business Applications

ERP, CRM, etc.

Data Warehouse

Data Mart Cube

Appliance

Hadoop, NoSQL,

Log-Data

Cloud SaaS

Static data Flowing data

Real-time data processing and analysis

Complex event processing

Stuctured and unstructured data

New data sources

Reporting Dashboard OLAP Operational Intelligence

Data & Text Mining Predictive Analytics

Data integration ETL

Page 17: Capturing big value in big data

T-Systems | Big Data 14.11.2013 16

HOW USE CASE SEGMENTATION DRIVES SOLUTION DESIGN AND TECHNOLOGY SELECTION

USE CASE POTENTIAL TOOL

Real-time Reporting of SAP OLTP data, including joins and data transformations

SAP HANA

Summarise Unstructured DATA LOGS (scheduled)

HADOOP MAP/REDUCE

Realtime reporting of Summarised Data Logs, with Joins to other NON OLTP Data

IMPALA

Near Realtime reporting of Social Media Data

IMPALA + HADOOP MAP/REDUCE (scheduled to collect recent Social Media Data)

Realtime reporting of recent OLTP data joined with recent Social Media Data

HANA + HADOOP MAP/REDUCE (scheduled to collect recent Social Media Data and load into HANA)

Image Analysis Processing (scheduled) HADOOP MAP/REDUCE (scheduled job runs sophisticated analysis of Video files and stores results in a structured file)

Image Analysis Reporting IMPALA (to report on results file)

Predictive Analysis Reporting (comparing OLTP & NON OLTP DATA)

HANA + HADOOP MAP/REDUCE (scheduled to collect & transfer applicable Historic or relevant Non OLTP Data to HANA)

Page 18: Capturing big value in big data

T-Systems | Big Data 14.11.2013 17

HOW USE CASE SEGMENTATION DRIVES SOLUTION DESIGN AND TECHNOLOGY SELECTION

Page 19: Capturing big value in big data

T-Systems | Big Data 14.11.2013 18

SUMMARY

Data Volumes are here to stay

Hadoop is getting more powerful, more realtime and easier to use

Hadoop is not your Big Data answer – it is part of your BI and Big Data ecosystem

An Enterprise Data Strategy and Data Governance is critical to success

Make sure you have two conversations in your enterprise

• A Business Conversation about the business values from your BI Ecosystem

• An IT Conversation to ensure your IT Organisation understands the new world of BI, the shortcomings, the strengths and roles of the component technologies

“What matters is how — and why — vastly more data leads to vastly greater value creation. Designing and determining those links is typically in the province of top management”

but needs to be facilitated by the IT Organisation in Business terms

Page 20: Capturing big value in big data

T-Systems | Big Data 14.11.2013 19

A PARTING THOUGHT HADOOP (AND BIG DATA) IS 4 V‘S NOT JUST 3

VALUE value comes from knowing more than the rest

ANALYTICS creates

Page 21: Capturing big value in big data

Backup

Page 22: Capturing big value in big data

T-Systems | Big Data 14.11.2013 21

AGENDA

Where are we with Big Data and Hadoop at the end of 2013?

What is the disruptive innovation in Hadoop?

What are target use cases, horizontally and telco-specific?

How do you start realizing value from Hadoop today?

What are the prerequisites for an effective value derived from Hadoop?

How does Hadoop complement existing investments in business intelligence?

How use case segmentation drives solution design and technology selection

Page 23: Capturing big value in big data

T-Systems | Big Data 14.11.2013 22

LEARNING THE LANGUAGE OF BIG DATA

Jaspersoft

Karmasphere

Studio

Talend Pentaho

Continuity

NoSQL MongoDB

Cassandra

CouchDB

Redis

Riak

Neo4j

Platfora

Tableau

Splunk

Shep Hadoop

MapReduce

ZooKeeper

Avro

Nutch

HDFS

Hive

Pig

Hbase

Chukwa

Matlab

R

Python JRuby

Ruby

Java C++

Kafka

InfoChimps

Skytree

GreenPlum

Aster

GoPivotal

Page 24: Capturing big value in big data

T-Systems | Big Data 14.11.2013 23

LEARNING THE LANGUAGE OF BIG DATA