steve jenkins - business opportunities for big data in the enterprise
DESCRIPTION
Steve Jenkins from MapR Technologies presentation from our Big Data breakfast conferenceTRANSCRIPT
Steve Jenkins
MapR Technologies
‘Business Opportunities for Big Data in the
Enterprise ‘
2©MapR Technologies - Confidential
Big Data in the Enterprise
Steve Jenkins VP EMEA MapR Technologies
3©MapR Technologies - Confidential
Business Value
4©MapR Technologies - Confidential
Changing landscape
90% digital data created in last two years
2.7 Zettabytes in 2012 predicting 7.9 Zettabytes in 2015• 1,000 Exabytes or 1 Billion Terabytes
6 billion phone subscriptions• 87% world population
1.011 billion facebook users• 604 million users login from mobile devices, monthly
400 million tweets a day• 84 million access by mobile
5©MapR Technologies - Confidential
Too much data ?
Retail industry 39% infrequent collection, not fast enough
42% could not link data at individual level
45% not using effectively
Only sample 10% of data
6©MapR Technologies - Confidential
“The use of big data will become a key basis of competition and growth for individual firms. In most industries, established competitors and new entrants alike will leverage data-driven strategies to innovate, compete, and capture value from deep and up-to-real-time information.” – McKinsey & Company
“The size, complexity of formats and speed of delivery exceeds the capabilities of traditional data management technologies” – Gartner
"The bringing together of a vast amount of data from public and private sources, combined with the intuition of business and thought leaders and the speed and affordability of today's computers, is what Big Data is all about.”– IDC
7©MapR Technologies - Confidential
Across all verticals, typical use cases
Logistics Fraud Loyalty programmes Sentiment analysis ETA calculations Customer insight Gene sequening Operations
8©MapR Technologies - Confidential
Biggest challenges 80% Finding talent
72% Training and education
76% Identifying the correct tools
32% Siloed data, non cooperation
Identifying the correct resources and use case• Data Scientist• Analytical capability
Based on 300 interviews by Infochimp
9©MapR Technologies - Confidential
10©MapR Technologies - Confidential
From Big Data to Insights
11©MapR Technologies - Confidential
General Observations
Analytics becoming a critical component in business environments
Base decisions on data Work with existing applications Principle: keep all data around – benefit from all data
– Human generated– Machine generated
Pioneered at Google and Amazon
12©MapR Technologies - Confidential
Hadoop Growth
13©MapR Technologies - Confidential
The Hadoop Ecosystem
14©MapR Technologies - Confidential
Case studies – unlocking the power of Big Data…. Financial Services (customer insights, fraud detection, etc.) Global Telecommunications - Data Warehouse Augmentation Petroleum - Trade, Logistics & Transportation Retail application
15©MapR Technologies - Confidential
Case study – Credit Card Company
Fraud detection
Personalized offers
Fraud investigation
tool
Fraud investigator
Fraud model
Recommendationtable
Queries on IT logs
MapR Big Data Platform
Credit card transactions
16©MapR Technologies - Confidential
Arrival of Big Data Impacts Data Warehouse
BIG DATA
VolumeVarietyVelocity
Prohibitively expensive storage costs
Inability to process unstructured formats
Faster arrival and processing needs
How can a Data Warehouse leverage Big Data?
17©MapR Technologies - Confidential
Case study – Data Warehouse Augmentation
Problem:– Major telecom vendor– Key step in billing pipeline handled by data warehouse (EDW)– EDW at maximum capacity– Multiple rounds of software optimization already done
Revenue limiting (= career limiting) bottleneck
Solution: Use MapR to off-load ETL processes that don’t fit EDW capabilities
18©MapR Technologies - Confidential
Clean Conform Normalize Present AccessTransformExtract
BillingSystems
Clean Conform NormalizeTransformExtractExtract Clean Conform Transform Normalize Present Access
BillingSystems
Current ETL Pipeline
Hybrid Solution Pipeline
Teradata
Hadoop Teradata
DataStage
Data Warehouse Augmentation
19©MapR Technologies - Confidential
Results of TCO Evaluation
CapEx: Cost avoidance for annual Teradata adds
Storage: 20x storage good for next 5 years
Cost: 100x cost reduction
Scale Out Architecture: New nodes can be added on the fly
No Disruption: Hybrid solution ensures no change to upstream/downstream business systems
Solution Technology 5-Yr TCO
Existing Teradata $66,950,000
New Hybrid: Teradata + Hadoop $33,000,000
Total Cost Savings $33,950,000
One Time Hadoop Investment of ~$6.5M Provides $33.9M Cost Savings
20©MapR Technologies - Confidential
Use case – Geo-spatial & time series dashboarding Data sources
– stock transactions– vessel positions– weather
Goal: provide aggregated overview + drill-down capability in a dashboard
Batch-generated overview (Hadoop’s MapReduce, HBase/M7)
Interactive, ad-hoc drill-down (HBase/M7, Apache Drill)
21©MapR Technologies - Confidential
Use case – Geo-spatial & time series dashboarding
batch-generated overview
interactive, ad-hoc drill-down
storage and access at scale
dashboard
22©MapR Technologies - Confidential
Combine Different Data Sources
Streaming writes to Hadoop
Retail purchase Info
Real-timeoffers
Hadoop
POS/Online Data
23©MapR Technologies - Confidential
MapR
24©MapR Technologies - Confidential
MapR Distribution for Apache Hadoop
Complete Hadoop distribution
Comprehensive management suite
Industry-standard interfaces
Combines open source packages with Enterprise-grade dependability
Higher performance
Pig
Hive
HBase
Mahout
Oozie
Whirr
Avro
Cascading
Nagios
Ganglia
MapR Control System
MapR Data Platform
MapR Control System
MapR Data Platform
Flume
Sqoop
HCatalog
Zookeeper
MapReduce
HBase
Whirr
Avro
25©MapR Technologies - Confidential
MapR Control System
MapR Data Platform
MapR Control System
MapR Data Platform
MapR: The Enterprise Grade Distribution
Enterprise Integration
Real-Time
Mission CriticalStreaming Writes, Instant Recovery, Real-time NoSQL, Lightweight OLTP
Self-healing, HA, Snapshots, Mirroring
Industry Standard APIs, NFS,ODBC, LDAP, REST
26©MapR Technologies - Confidential
MapR Supports Broad Set of Customers
Log analysis HBase
Customer targeting Social media analysis
Customer Revenue Analytics
ETL Offload
Advertising exchange analysis and optimization
Clickstream Analysis Quality profiling/field
failure analysis
Enterprise Grade Platform
COOP features
Monitoring and measuring online behavior
Fraud Detection Channel analytics
Recommendation Engine Fraud detection and Prevention
Customer Behavior Analysis Brand Monitoring
Customer targeting Viewer Behavioral analytics
Recommendation Engine Family tree connections
Global threat analytics
Virus analysis
Patient care monitoring
Leading RetailerGlobal Credit Card Issuer
Intrusion detection & prevention Forensic analysis
27©MapR Technologies - Confidential
Thank You
28©MapR Technologies - Confidential
Industry Leaders Choose MapR in the Cloud
Google chose MapR to provide Hadoop on Google
Compute Engine
Amazon EMR is the largest Hadoop provider in revenue
and # of clusters