capturing big value in big data
DESCRIPTION
TRANSCRIPT
This document is offered compliments of BSP Media Group. www.bspmediagroup.com
All rights reserved.
T-Systems | Big Data 14.11.2013 1
HADOOP
Capturing Big Value in Big Data
T-Systems | Big Data 14.11.2013 2
BIG DATA: WHY NOW?
x 2 digital data globally doubles every two years1
90% of all data is unstructured and cannot be handled with traditional analytics tools1
70% of all IT invest 2015 will be Big Data driven2
10-50% cost reduction in production through Big Data exploitation4
85% of Top 500 enterprises will fail to exploit Big Data2
>30% of enterprises have no formal concept for data management5
1 IDC Predictions 2012 2 Gartner, Predicts 2012 3 Wikibon 2012, Big Data Market Size and Vendor Revenues.
4 McKinsey Global Institute 2011, Big data: The next frontier for innovation, competition, and productivity
5 Economist Intelligence Unit 2011, Big data. Harnessing a game-changing asset
T-Systems | Big Data 14.11.2013 3
THE BI ECOSYSTEM ACCORDING TO FORRESTER
T-Systems | Big Data 14.11.2013 4
THE 2012 GARTNER HYPE CYCLE FOR BIG DATA IN-MEMORY ANALYTICS APPROACHING MAINSTREAM ADOPTION
T-Systems | Big Data 14.11.2013 5
POSITIONING HADOOP NOVEMBER 2013 HADOOP APPROACHING MAINSTREAM ADOPTION
T-Systems | Big Data 14.11.2013 6
HADOOP VS IN-MEMORY ANALYTICS
IMA is the Ferrari: Sexy, very fast, but with limited luggage space
Hadoop (with Impala) is a fleet of MPV's: Good performance & capacity, easy to drive, affordable
Hadoop (without Impala) is a fleet of Long Haul trucks: Moderate performance, Excellent Capacity, needs a specialist driver’s license and drives overnight.
How fast do you want your delivery made? What is being delivered? How much do you want to spend?
Do you have specialist drivers?
Some Hadoop Improvements
• With the ecosystem of contributors and distributions, Hadoop becomes easier and easier to use e.g. Cloudera’s Impala, Microsoft’s HDInsight, MapR’s Drill, Hortonworks’ Stinger Initiative
• With Cloudera’s Hadoop offering when you buy the Trucks they throw in the MPV's for free
• Hadoop 2.0 brings YARN, Graph Analysis and Stream Processing
• With the speed of improvements in HDFS/HBase/Hive/Yarn, the gap between batch and real-time/low-latency is going to be cut fairly soon e.g. from Hive 0.10 to 0.11 with the new RCFile data format there is a performance boost >10x
T-Systems | Big Data 14.11.2013 7
HADOOP INNOVATION #1: MUCH CHEAPER STORAGE
7
SAN Storage
$2 - $10/Gigabyte
$1M gets:
0.5Petabytes
200,000 IOPS
8Gbyte/sec
Software by
HDS, bundled with
hardware by HDS
NAS File Servers
$1 - $5/Gigabyte
$1M gets:
1 Petabyte
200,000 IOPS
10Gbyte/sec
Software by
NetApp, bundled with
hardware by NetApp
Local Storage
<$0.50/Gigabyte
$1M gets:
10 Petabytes
400,000 IOPS
250 Gbytes/sec
Software by
open source Hadoop ecosystem,
hardware self-assembled
T-Systems | Big Data 14.11.2013 8
HADOOP INNOVATION #2: STORE FIRST, QUESTIONS LATER
Business Problem
Backward-looking analysis
Using data out of business applications
Quasi-real-time, In-memory analysis Using data out of business applications
Complex Event Processing
Batch, Forward-looking predictive analysis
Questions defined in the moment, using data from many sources
Technology Solution SAP Business Objects
IBM Cognos MicroStrategy
Oracle Exadata SAP HANA
Cloudera Hadoop Hortonworks Hadoop Microsoft Hadoop
Structured Limited (2 – 3 TB in RAM)
Structured Limited (1 PB in RAM)
Structured or unstructured Quasi unlimited (20 – 30 PB)
Legacy BI High performance BI „Hadoop“ Ecosystem
Selected Vendors
Data Type/Scalability
T-Systems | Big Data 14.11.2013 9
GARTNER HYPE CYCLE FOR ANALYTIC APPLICATIONS A GREAT STARTING POINT FOR BI AND BIG DATA USE CASES
T-Systems | Big Data 14.11.2013 10
Implementing HADOOP to generate profit selected Use Cases
Research and analysis of video, audio and online print
Semantic analyses and results visualization
Intelligent News Discovery
Print Queue analysis for Confidential and/or sensitive documents
Email Analysis
Comprehensive monitoring of unlimited data volumes and types
Security Analytics
Transparency across all suppliers and prices
Stronger negotiating position in purchasing
Efficient cashflow management
Smarter Procurement
Analysis of traffic situations
Improved planning and local resident satisfaction
Big Event optimisation
Metro Traffic Diagnostics
Monitoring of marketing campaigns
Consideration of all sources and formats
Efficient campaign management
Campaign Analytics
Driving tips for drivers
Competitive advantage thanks to cost reductions
Lower fuel consumption and CO2 emissions
Better planning of routes and cargo loads
Efficient Fleet Management
Optimized use of resources for all energy sources
Future utilisation forecasts
Feeds into customer-specific pricing
Smarter Energy Management
T-Systems | Big Data 14.11.2013 11
Supply Chain Optimization controlling own and OEM production capacity
Production Optimization using Sensor Data and Machine 2 Machine Communication
Using Online Forums for Product Development & Sentiment Analysis
Social Media Usage for Macro/Micro Trend analysis
Massive Parallel Processing for Drug Testing in Pharma
CERN number crunching for test data (40GB/sec)
Predictive Maintenance & Prediction (Combat unwanted production stops)
Production Planning for Seasonal Goods (multi factor )
Truck transportation optimization (transport order navigational data, combined with traffic data)
Road Charge Optimization (real time adaptation of fees according to current traffic)
Customer Individual Discounts for products on websites and call centers (multi factor, real time)
Financial Simulation and Scenario Calculations
Financial Simulation and Scenario Calculations
Online Fraud Detection (Credit Card transactions, etc.)
Risk Controlling (Market Risk/Value at Risk)
Detection of unknown financial risk (e.g. for real estate loans)
Optimize Target Group Marketing for online banking based on trading/depot transactions
Online Marketing Campaign Optimization
Big Data for Point of Sales Optimization/Cross Selling
Big Data for Point of Sales Optimization/Cross Selling
Competitive Analysis using Online Press, Social Media with Scraping and Text Analysis
Customer Churn Analysis for Prepaid Telco business (behavior based)
HADOOP USE CASES BY BUSINESS FUNCTION
Marketing & Sales Product Development & Research
Product Service & Support
Distribution & Logistics Finance & Controlling
T-Systems | Big Data 14.11.2013 12
WHAT ARE THE PRE-REQUISITES FOR AN EFFECTIVE VALUE DERIVED FROM HADOOP?
Foundation is a Data Strategy
• Map Data to Business Value – which data is required to deliver on a value statement or answer a fundamental business question
• Categorise critical Data vs non-Critical Data – critical data is not only the data identified in the Business Value question above, but is that data that could/should have long-term (potential) value and is typically used across multiple business processes or a value chain. Master Data Management is a key activity here
• Define your Data Ecosystem – not only the technology but the processes, responsibilities matched to roles - and three core capabilities – data, insight and action
• Data Governance
Define the appropriate Data Roles in the organisation
the governance structure must be federated, with a central governing body addressing the most important, common data and most of the data managed locally in the lines of business.
Improve Data Quality
Improve Data Accessibility
T-Systems | Big Data 14.11.2013 13
SOME NEW ROLES IN DATA/ANALYTICS THE COMING OF AGE OF DATA IN THE ENTERPRISE
The Data Scientist
The Chief Data Officer
Data Hygienist/Data Steward
Data Explorer
Business Solution Architect/Domain Expert
Campaign Expert
Data Security Officer
50% Big Data talent gap expected until 20184
4 McKinsey Global Institute 2011, Big data: The
next frontier for innovation, competition, and
productivity
T-Systems | Big Data 14.11.2013 14
MANY ORGANISATIONS RESEMBLE THIS TODAY HOW DOES HADOOP COMPLEMENT EXISTING INVESTMENTS IN BUSINESS INTELLIGENCE?
Existing data sources
Business Intelligence Tools and analytical applications
Transactional OLTP DBMS
Business Applications
ERP, CRM, etc.
Data Warehouse
Data Mart Cube
Appliance
Reporting Dashboard Data Mining
Data integration ETL
OLAP
T-Systems | Big Data 14.11.2013 15
HADOOP COMPLEMENTS EXISTING BI INVESTMENT
Existing data sources
Business Intelligence Tools and analytical applications
Transactional OLTP DBMS
Business Applications
ERP, CRM, etc.
Data Warehouse
Data Mart Cube
Appliance
Hadoop, NoSQL,
Log-Data
Cloud SaaS
Static data Flowing data
Real-time data processing and analysis
Complex event processing
Stuctured and unstructured data
New data sources
Reporting Dashboard OLAP Operational Intelligence
Data & Text Mining Predictive Analytics
Data integration ETL
T-Systems | Big Data 14.11.2013 16
HOW USE CASE SEGMENTATION DRIVES SOLUTION DESIGN AND TECHNOLOGY SELECTION
USE CASE POTENTIAL TOOL
Real-time Reporting of SAP OLTP data, including joins and data transformations
SAP HANA
Summarise Unstructured DATA LOGS (scheduled)
HADOOP MAP/REDUCE
Realtime reporting of Summarised Data Logs, with Joins to other NON OLTP Data
IMPALA
Near Realtime reporting of Social Media Data
IMPALA + HADOOP MAP/REDUCE (scheduled to collect recent Social Media Data)
Realtime reporting of recent OLTP data joined with recent Social Media Data
HANA + HADOOP MAP/REDUCE (scheduled to collect recent Social Media Data and load into HANA)
Image Analysis Processing (scheduled) HADOOP MAP/REDUCE (scheduled job runs sophisticated analysis of Video files and stores results in a structured file)
Image Analysis Reporting IMPALA (to report on results file)
Predictive Analysis Reporting (comparing OLTP & NON OLTP DATA)
HANA + HADOOP MAP/REDUCE (scheduled to collect & transfer applicable Historic or relevant Non OLTP Data to HANA)
T-Systems | Big Data 14.11.2013 17
HOW USE CASE SEGMENTATION DRIVES SOLUTION DESIGN AND TECHNOLOGY SELECTION
T-Systems | Big Data 14.11.2013 18
SUMMARY
Data Volumes are here to stay
Hadoop is getting more powerful, more realtime and easier to use
Hadoop is not your Big Data answer – it is part of your BI and Big Data ecosystem
An Enterprise Data Strategy and Data Governance is critical to success
Make sure you have two conversations in your enterprise
• A Business Conversation about the business values from your BI Ecosystem
• An IT Conversation to ensure your IT Organisation understands the new world of BI, the shortcomings, the strengths and roles of the component technologies
“What matters is how — and why — vastly more data leads to vastly greater value creation. Designing and determining those links is typically in the province of top management”
but needs to be facilitated by the IT Organisation in Business terms
T-Systems | Big Data 14.11.2013 19
A PARTING THOUGHT HADOOP (AND BIG DATA) IS 4 V‘S NOT JUST 3
VALUE value comes from knowing more than the rest
ANALYTICS creates
Backup
T-Systems | Big Data 14.11.2013 21
AGENDA
Where are we with Big Data and Hadoop at the end of 2013?
What is the disruptive innovation in Hadoop?
What are target use cases, horizontally and telco-specific?
How do you start realizing value from Hadoop today?
What are the prerequisites for an effective value derived from Hadoop?
How does Hadoop complement existing investments in business intelligence?
How use case segmentation drives solution design and technology selection
T-Systems | Big Data 14.11.2013 22
LEARNING THE LANGUAGE OF BIG DATA
Jaspersoft
Karmasphere
Studio
Talend Pentaho
Continuity
NoSQL MongoDB
Cassandra
CouchDB
Redis
Riak
Neo4j
Platfora
Tableau
Splunk
Shep Hadoop
MapReduce
ZooKeeper
Avro
Nutch
HDFS
Hive
Pig
Hbase
Chukwa
Matlab
R
Python JRuby
Ruby
Java C++
Kafka
InfoChimps
Skytree
GreenPlum
Aster
GoPivotal
T-Systems | Big Data 14.11.2013 23
LEARNING THE LANGUAGE OF BIG DATA