sap big data forum 2013 t2 chris harris - hortonworks
TRANSCRIPT
© Hortonworks Inc. 2012
Apache Hadoop's Role in
Your Big Data Architecture Chris Harris EMEA, Hortonworks [email protected] Twitter : cj_harris5
Page 1
© Hortonworks Inc. 2013 © Hortonworks Inc. 2013
Agenda
• The Growth of Enterprise Data
• Hadoop Market Drivers
• Hortonworks – an Overview
• The Future of Hadoop and Big Data
Page 2
© Hortonworks Inc. 2013
Data
Explosion
The Growth of Data in the Enterprise
Page 3
By 2015, organizations that build a modern information management
system will outperform their peers financially by 20 percent.
– Gartner, Mark Beyer, “Information Management in the 21st Century”
1 Zettabyte (ZB)
= 1 Billion TBs
15x
growth rate of
machine generated data by 2020
Source: IDC
© Hortonworks Inc. 2013 © Hortonworks Inc. 2013
Next Generation Data Architecture Drivers
Business Drivers
Technical Drivers
Financial Drivers
• From reactive analytics to proactive customer interaction
• Find insights for competitive advantage & optimal returns
• Cost of data systems, as % of IT spend, continues to grow
• Cost advantages of commodity hardware & open source
• Data continues to grow exponentially
• Data is increasingly everywhere and in many formats
© Hortonworks Inc. 2013
Market Transitioning into Early Majority
time
rela
tive
%
cu
sto
me
rs
T
he C
HA
SM
Customers want
solutions & convenience
Customers want
technology & performance
Innovators,
technology
enthusiasts
Early
adopters,
visionaries
Early
majority,
pragmatists
Late majority,
conservatives
Laggards,
Skeptics
Source: Geoffrey Moore - Crossing the Chasm
Page 5
© Hortonworks Inc. 2013
6 Key Hadoop DATA TYPES
1. Sentiment Understand how your customers feel about your
brand and products – right now
2. Clickstream Capture and analyze website visitors’ data trails
and optimize your website
3. Sensor/Machine Discover patterns in data streaming automatically
from remote sensors and machines
4. Geographic Analyze location-based data to manage
operations where they occur
5. Server Logs Research logs to diagnose process failures and
prevent security breaches
6. Text Understand patterns in text across millions of web
pages, emails, and documents
Value
Page 6
© Hortonworks Inc. 2013
Apache Hadoop Enterprise Use Cases (1 of 2)
Vertical Use Case Data Type
Financial Services
New Account Risk Screens Text, Server Logs
Fraud Prevention Server Logs
Trading Risk Server Logs
Maximize Deposit Spread Text, Server Logs
Insurance Underwriting Geographic, Sensor, Text
Accelerate Loan Processing Text
Telecom
Call Detail Records (CDRs) Machine, Geographic
Infrastructure Investment Machine, Server Logs
Next Product to Buy (NPTB) Clickstream
Real-time Bandwidth Allocation Server Logs, Text, Sentiment
New Product Development Machine, Geographic
Retail
360° View of the Customer Clickstream, Text
Analyze Brand Sentiment Sentiment
Localized, Personalized Promotions Geographic
Website Optimization Clickstream
Optimal Store Layout Sensor
Page 7
© Hortonworks Inc. 2013
Apache Hadoop Enterprise Use Cases (2 of 2)
Vertical Use Case Data Type
Manufacturing
Supply Chain and Logistics Sensor
Assembly Line Quality Assurance Sensor
Proactive Maintenance Machine
Crowdsourced Quality Assurance Sentiment
Healthcare
Genomic Sequencing Structured
Real-time Data for Blood Sampling Sensor, Server Logs
Rapid, Mobile Detection of Autism Unstructured
Reducing Cost of Cancer Treatment Sensor, Unstructured
Perpetual Storage of Research Data Sensor, Unstructured
Page 8
© Hortonworks Inc. 2012
New Account Risk Screens
Business Problem
• Banks take thousands of new account applications daily
• Text-based 3rd party risk reports displayed to banker
• Bankers can (and do) override risk recs to open account
• Account charge-offs and fraud costs banks millions
Solution
• HDP helps senior managers control new account risk
• Match banker decisions with multiple sources of information they use
to make those decisions
• Correct risky behavior by sanctioning individuals, updating policies,
improving training or identifying fraud.
Financial Services
Data: Text, Server Logs
Page 9
© Hortonworks Inc. 2012
Fraud Prevention
Business Problem
• Financial institutions are always at risk of fraud
• Fraudsters test bank systems for vulnerabilities
• This testing leaves subtle patterns often undetected by bank
employees or law enforcement
• Fraud losses costs banks millions
Solution
• HDP reduces the cost to detect fraudulent activity
• HDP stores more types of data for longer
• Analysis of data in the “data lake” exposes fraudulent patterns that
would have gone undetected
Financial Services
Data: Server Logs
Page 10
© Hortonworks Inc. 2012
Call Detail Records (CDRs)
Business Problem
• Telcos perform forensics on dropped calls and sound quality
• Call detail records flow in at a rate of millions per second
• High volume makes pattern recognition and root cause analysis
difficult, which need to happen in real-time
• Delay causes attrition and harms servicing margins
Solution
• HDP can ingest millions of CDRs per second
• HDP facilitates data retention and root cause analysis
• Continuously improve call quality, customer satisfaction and servicing
margins
Telecom
Data: Machine, Geo
Page 11
© Hortonworks Inc. 2012
Infrastructure Investment
Business Problem
• Telecom marketing and capacity planning are coordinated
• Consumption of bandwidth and services can be out of sync with plans
for new towers and transmission lines
• Mismatch between infrastructure investments and the actual return on
investment puts revenue at risk
Solution
• HDP helps telcos understand service consumption in a particular
state, county or neighborhood
• Analyze Call Detail Records (CDRs) and network loads, more
intelligently, over longer periods of time
• Plan infrastructure with more precision and less variability
Telecom
Data: Machine, Logs
Page 12
© Hortonworks Inc. 2012
360° View of the Customer
Business Problem
• Retailers interact with customers across multiple channels
• Customer interaction and purchase data is often siloed
• Few retailers can correlate customer purchases with marketing
campaigns and online browsing behavior
• Merging data in relational databases is expensive
Solution
• HDP gives retailers a 360° view of customer behavior
• Store data longer & track phases of the customer lifecycle
• Gain competitive advantage: increase sales, reduce supply chain
expenses and retain the best customers
Retail
Data: Clickstream, Text
Page 13
© Hortonworks Inc. 2012
Analyze Brand Sentiment
Business Problem
• Enterprises lack a reliable way to track their brand health
• It is difficult to analyze how advertising, competitor moves, product
launches or news stories affect the brand
• Internal brand studies can be slow, expensive and flawed
Solution
• HDP allows quick, unbiased brand sentiment snapshots
• Analyze sentiment from Twitter, Facebook, LinkedIn or industry-
specific social media streams
• Retailers better understand customer perceptions, to align their
communications, products and promotions with those perceptions and
expectations
Retail
Data: Sentiment
Page 14
© Hortonworks Inc. 2012
Supply Chain and Logistics
Business Problem
• Manufacturers need just-in-time availability of components
• Stock-outs cause harmful production delays
• Sensors and RFID tags reduce the cost of capturing more supply
chain data, which needs storage and processing
Solution
• HDP stores unstructured, streaming, “dirty” sensor data
• Manufacturers get lead time to make alternative arrangements for
supply chain disruptions
• Prevent stock-outs, reduce supply chain costs and improve margins
for the finished product
Manufacturing
Data: Sensor
Page 15
© Hortonworks Inc. 2012
Assembly Line Quality Assurance
Business Problem
• High-tech manufacturing uses sensors to capture data at critical steps
in the manufacturing process
• Sensor data helps diagnose errors with returned products
• Much data is discarded, because of high storage costs
• Lean margins mean small budgets for data analysis
Solution
• HDP stores unstructured, streaming, “dirty” sensor data
• Manufacturers can proactively analyze more data, over a longer time,
to detect subtle issues otherwise undetected
• Sensor data managed with HDP can help a manufacturer reduce
warranty costs and earn a reputation for quality
Manufacturing
Data: Sensor
Page 16
© Hortonworks Inc. 2013 © Hortonworks Inc. 2013
Growth Pressures Existing Data Architectures A
PP
LIC
ATI
ON
S D
ATA
SY
STEM
S
TRADITIONAL REPOS
RDBMS EDW MPP
DA
TA S
OU
RC
ES
OLTP, POS
SYSTEMS
OPERATIONAL TOOLS
MANAGE & MONITOR
Traditional Sources (RDBMS, OLTP, OLAP)
DEV & DATA TOOLS
BUILD & TEST
Page 17
Packaged Analytic App
Custom Analytic App
Data growth
8% annually
© Hortonworks Inc. 2013 © Hortonworks Inc. 2013
An Emerging Data Architecture A
PP
LIC
ATI
ON
S D
ATA
SY
STEM
S
TRADITIONAL REPOS
RDBMS EDW MPP
DA
TA S
OU
RC
ES
OLTP, POS
SYSTEMS
OPERATIONAL TOOLS
MANAGE & MONITOR
Traditional Sources (RDBMS, OLTP, OLAP)
New Sources (web logs, email, sensors, social media)
DEV & DATA TOOLS
BUILD & TEST
Packaged Analytic App
ENTERPRISE HADOOP PLATFORM
Page 18
Custom Analytic App
Data growth
85% annually
© Hortonworks Inc. 2013 © Hortonworks Inc. 2013
Agenda
• The Growth of Enterprise Data
• Hadoop Market Drivers
• An Overview
• The Future of Hadoop and Big Data
Page 20
© Hortonworks Inc. 2013 © Hortonworks Inc. 2013
A Brief History of Apache Hadoop
Page 21
2013
2005: Yahoo! creates
team under E14 to
work on Hadoop
Yahoo! begins to
Operate at scale
Enterprise
Hadoop
Apache Project
Established
Hortonworks
Data Platform
2004 2008 2010 2012 2006
2011: Hortonworks created to focus on
“Enterprise Hadoop“
© Hortonworks Inc. 2013 © Hortonworks Inc. 2013
Leadership Starts at the Core
Page 22
• Driving next generation Hadoop
– YARN, MapReduce2, HDFS2, High
Availability, Disaster Recovery
• 420k+ lines authored since 2006
– More than twice nearest contributor
• Deeply integrating w/ecosystem
– Enabling new deployment platforms – (ex. Windows & Azure, Linux & VMware HA)
– Creating deeply engineered solutions – (ex. Teradata big data appliance)
• All Apache, NO holdbacks
– 100% of code contributed to Apache
© Hortonworks Inc. 2013 © Hortonworks Inc. 2013
Agenda
• The Growth of Enterprise Data
• Hadoop Market Drivers
• Hortonworks – an Overview
• The Future of Hadoop and Big Data
Page 23
© Hortonworks Inc. 2013
The 1st Generation of Hadoop: Batch
HADOOP 1.0 Built for Web-Scale Batch Apps
Single App
BATCH
HDFS
Single App
INTERACTIVE
Single App
BATCH
HDFS
• All other usage
patterns must
leverage that same
infrastructure
• Forces the creation
of silos for managing
mixed workloads
Single App
BATCH
HDFS
Single App
ONLINE
© Hortonworks Inc. 2013 © Hortonworks Inc. 2013
The Enterprise Requirement: Beyond Batch
To become an enterprise viable data platform, customers have
told us they want to store ALL DATA in one place and interact with
it in MULTIPLE WAYS
– Simultaneously & with predictable levels of service
Page 25
HDFS (Redundant, Reliable Storage)
BATCH INTERACTIVE STREAMING GRAPH IN-MEMORY HPC MPI ONLINE SEARCH
© Hortonworks Inc. 2013 © Hortonworks Inc. 2013
YARN: Taking Hadoop Beyond Batch
• Created to manage resource needs across all uses
• Ensures predictable performance & QoS for all apps
• Enables apps to run “IN” Hadoop rather than “ON”
– Key to leveraging all other common services of the Hadoop platform:
security, data lifecycle management, etc.
Page 26
Applications Run Natively IN Hadoop
HDFS2 (Redundant, Reliable Storage)
YARN (Cluster Resource Management)
BATCH (MapReduce)
INTERACTIVE (Tez)
STREAMING (Storm, S4,…)
GRAPH (Giraph)
IN-MEMORY (Spark)
HPC MPI (OpenMPI)
ONLINE (HBase)
OTHER (Search)
(Weave…)
© Hortonworks Inc. 2013 © Hortonworks Inc. 2013
The Future of the Hadoop and Big Data
• The next generation data architecture evolving rapidly
–Store ALL data in a Hadoop data reservoir
–Push subsets of data to a final platform for processing
• Hadoop 2.0 takes Hadoop beyond “Batch”
–2.0 YARN based architecture enabling mixed use workloads with
enterprise resource management
• Enabling a new generation of applications at scale
–Based on new data types (sensor, sentiment, clickstream, etc.) or
keeping existing types for much longer
© Hortonworks Inc. 2013
Hortonworks Sandbox
Page 28
Hands on tutorials
integrated into
Sandbox
HDP environment for
evaluation
© Hortonworks Inc. 2013 Page 29
THANK YOU! Chris Harris
Download Sandbox
hortonworks.com/sandbox