sap big data forum 2013 t2 chris harris - hortonworks

29
© Hortonworks Inc. 2012 Apache Hadoop's Role in Your Big Data Architecture Chris Harris EMEA, Hortonworks [email protected] Twitter : cj_harris5 Page 1

Upload: sap-nederland

Post on 20-Aug-2015

545 views

Category:

Technology


0 download

TRANSCRIPT

© Hortonworks Inc. 2012

Apache Hadoop's Role in

Your Big Data Architecture Chris Harris EMEA, Hortonworks [email protected] Twitter : cj_harris5

Page 1

© Hortonworks Inc. 2013 © Hortonworks Inc. 2013

Agenda

• The Growth of Enterprise Data

• Hadoop Market Drivers

• Hortonworks – an Overview

• The Future of Hadoop and Big Data

Page 2

© Hortonworks Inc. 2013

Data

Explosion

The Growth of Data in the Enterprise

Page 3

By 2015, organizations that build a modern information management

system will outperform their peers financially by 20 percent.

– Gartner, Mark Beyer, “Information Management in the 21st Century”

1 Zettabyte (ZB)

= 1 Billion TBs

15x

growth rate of

machine generated data by 2020

Source: IDC

© Hortonworks Inc. 2013 © Hortonworks Inc. 2013

Next Generation Data Architecture Drivers

Business Drivers

Technical Drivers

Financial Drivers

• From reactive analytics to proactive customer interaction

• Find insights for competitive advantage & optimal returns

• Cost of data systems, as % of IT spend, continues to grow

• Cost advantages of commodity hardware & open source

• Data continues to grow exponentially

• Data is increasingly everywhere and in many formats

© Hortonworks Inc. 2013

Market Transitioning into Early Majority

time

rela

tive

%

cu

sto

me

rs

T

he C

HA

SM

Customers want

solutions & convenience

Customers want

technology & performance

Innovators,

technology

enthusiasts

Early

adopters,

visionaries

Early

majority,

pragmatists

Late majority,

conservatives

Laggards,

Skeptics

Source: Geoffrey Moore - Crossing the Chasm

Page 5

© Hortonworks Inc. 2013

6 Key Hadoop DATA TYPES

1. Sentiment Understand how your customers feel about your

brand and products – right now

2. Clickstream Capture and analyze website visitors’ data trails

and optimize your website

3. Sensor/Machine Discover patterns in data streaming automatically

from remote sensors and machines

4. Geographic Analyze location-based data to manage

operations where they occur

5. Server Logs Research logs to diagnose process failures and

prevent security breaches

6. Text Understand patterns in text across millions of web

pages, emails, and documents

Value

Page 6

© Hortonworks Inc. 2013

Apache Hadoop Enterprise Use Cases (1 of 2)

Vertical Use Case Data Type

Financial Services

New Account Risk Screens Text, Server Logs

Fraud Prevention Server Logs

Trading Risk Server Logs

Maximize Deposit Spread Text, Server Logs

Insurance Underwriting Geographic, Sensor, Text

Accelerate Loan Processing Text

Telecom

Call Detail Records (CDRs) Machine, Geographic

Infrastructure Investment Machine, Server Logs

Next Product to Buy (NPTB) Clickstream

Real-time Bandwidth Allocation Server Logs, Text, Sentiment

New Product Development Machine, Geographic

Retail

360° View of the Customer Clickstream, Text

Analyze Brand Sentiment Sentiment

Localized, Personalized Promotions Geographic

Website Optimization Clickstream

Optimal Store Layout Sensor

Page 7

© Hortonworks Inc. 2013

Apache Hadoop Enterprise Use Cases (2 of 2)

Vertical Use Case Data Type

Manufacturing

Supply Chain and Logistics Sensor

Assembly Line Quality Assurance Sensor

Proactive Maintenance Machine

Crowdsourced Quality Assurance Sentiment

Healthcare

Genomic Sequencing Structured

Real-time Data for Blood Sampling Sensor, Server Logs

Rapid, Mobile Detection of Autism Unstructured

Reducing Cost of Cancer Treatment Sensor, Unstructured

Perpetual Storage of Research Data Sensor, Unstructured

Page 8

© Hortonworks Inc. 2012

New Account Risk Screens

Business Problem

• Banks take thousands of new account applications daily

• Text-based 3rd party risk reports displayed to banker

• Bankers can (and do) override risk recs to open account

• Account charge-offs and fraud costs banks millions

Solution

• HDP helps senior managers control new account risk

• Match banker decisions with multiple sources of information they use

to make those decisions

• Correct risky behavior by sanctioning individuals, updating policies,

improving training or identifying fraud.

Financial Services

Data: Text, Server Logs

Page 9

© Hortonworks Inc. 2012

Fraud Prevention

Business Problem

• Financial institutions are always at risk of fraud

• Fraudsters test bank systems for vulnerabilities

• This testing leaves subtle patterns often undetected by bank

employees or law enforcement

• Fraud losses costs banks millions

Solution

• HDP reduces the cost to detect fraudulent activity

• HDP stores more types of data for longer

• Analysis of data in the “data lake” exposes fraudulent patterns that

would have gone undetected

Financial Services

Data: Server Logs

Page 10

© Hortonworks Inc. 2012

Call Detail Records (CDRs)

Business Problem

• Telcos perform forensics on dropped calls and sound quality

• Call detail records flow in at a rate of millions per second

• High volume makes pattern recognition and root cause analysis

difficult, which need to happen in real-time

• Delay causes attrition and harms servicing margins

Solution

• HDP can ingest millions of CDRs per second

• HDP facilitates data retention and root cause analysis

• Continuously improve call quality, customer satisfaction and servicing

margins

Telecom

Data: Machine, Geo

Page 11

© Hortonworks Inc. 2012

Infrastructure Investment

Business Problem

• Telecom marketing and capacity planning are coordinated

• Consumption of bandwidth and services can be out of sync with plans

for new towers and transmission lines

• Mismatch between infrastructure investments and the actual return on

investment puts revenue at risk

Solution

• HDP helps telcos understand service consumption in a particular

state, county or neighborhood

• Analyze Call Detail Records (CDRs) and network loads, more

intelligently, over longer periods of time

• Plan infrastructure with more precision and less variability

Telecom

Data: Machine, Logs

Page 12

© Hortonworks Inc. 2012

360° View of the Customer

Business Problem

• Retailers interact with customers across multiple channels

• Customer interaction and purchase data is often siloed

• Few retailers can correlate customer purchases with marketing

campaigns and online browsing behavior

• Merging data in relational databases is expensive

Solution

• HDP gives retailers a 360° view of customer behavior

• Store data longer & track phases of the customer lifecycle

• Gain competitive advantage: increase sales, reduce supply chain

expenses and retain the best customers

Retail

Data: Clickstream, Text

Page 13

© Hortonworks Inc. 2012

Analyze Brand Sentiment

Business Problem

• Enterprises lack a reliable way to track their brand health

• It is difficult to analyze how advertising, competitor moves, product

launches or news stories affect the brand

• Internal brand studies can be slow, expensive and flawed

Solution

• HDP allows quick, unbiased brand sentiment snapshots

• Analyze sentiment from Twitter, Facebook, LinkedIn or industry-

specific social media streams

• Retailers better understand customer perceptions, to align their

communications, products and promotions with those perceptions and

expectations

Retail

Data: Sentiment

Page 14

© Hortonworks Inc. 2012

Supply Chain and Logistics

Business Problem

• Manufacturers need just-in-time availability of components

• Stock-outs cause harmful production delays

• Sensors and RFID tags reduce the cost of capturing more supply

chain data, which needs storage and processing

Solution

• HDP stores unstructured, streaming, “dirty” sensor data

• Manufacturers get lead time to make alternative arrangements for

supply chain disruptions

• Prevent stock-outs, reduce supply chain costs and improve margins

for the finished product

Manufacturing

Data: Sensor

Page 15

© Hortonworks Inc. 2012

Assembly Line Quality Assurance

Business Problem

• High-tech manufacturing uses sensors to capture data at critical steps

in the manufacturing process

• Sensor data helps diagnose errors with returned products

• Much data is discarded, because of high storage costs

• Lean margins mean small budgets for data analysis

Solution

• HDP stores unstructured, streaming, “dirty” sensor data

• Manufacturers can proactively analyze more data, over a longer time,

to detect subtle issues otherwise undetected

• Sensor data managed with HDP can help a manufacturer reduce

warranty costs and earn a reputation for quality

Manufacturing

Data: Sensor

Page 16

© Hortonworks Inc. 2013 © Hortonworks Inc. 2013

Growth Pressures Existing Data Architectures A

PP

LIC

ATI

ON

S D

ATA

SY

STEM

S

TRADITIONAL REPOS

RDBMS EDW MPP

DA

TA S

OU

RC

ES

OLTP, POS

SYSTEMS

OPERATIONAL TOOLS

MANAGE & MONITOR

Traditional Sources (RDBMS, OLTP, OLAP)

DEV & DATA TOOLS

BUILD & TEST

Page 17

Packaged Analytic App

Custom Analytic App

Data growth

8% annually

© Hortonworks Inc. 2013 © Hortonworks Inc. 2013

An Emerging Data Architecture A

PP

LIC

ATI

ON

S D

ATA

SY

STEM

S

TRADITIONAL REPOS

RDBMS EDW MPP

DA

TA S

OU

RC

ES

OLTP, POS

SYSTEMS

OPERATIONAL TOOLS

MANAGE & MONITOR

Traditional Sources (RDBMS, OLTP, OLAP)

New Sources (web logs, email, sensors, social media)

DEV & DATA TOOLS

BUILD & TEST

Packaged Analytic App

ENTERPRISE HADOOP PLATFORM

Page 18

Custom Analytic App

Data growth

85% annually

© Hortonworks Inc. 2013 © Hortonworks Inc. 2013

An Emerging Data Architecture

Page 19

© Hortonworks Inc. 2013 © Hortonworks Inc. 2013

Agenda

• The Growth of Enterprise Data

• Hadoop Market Drivers

• An Overview

• The Future of Hadoop and Big Data

Page 20

© Hortonworks Inc. 2013 © Hortonworks Inc. 2013

A Brief History of Apache Hadoop

Page 21

2013

2005: Yahoo! creates

team under E14 to

work on Hadoop

Yahoo! begins to

Operate at scale

Enterprise

Hadoop

Apache Project

Established

Hortonworks

Data Platform

2004 2008 2010 2012 2006

2011: Hortonworks created to focus on

“Enterprise Hadoop“

© Hortonworks Inc. 2013 © Hortonworks Inc. 2013

Leadership Starts at the Core

Page 22

• Driving next generation Hadoop

– YARN, MapReduce2, HDFS2, High

Availability, Disaster Recovery

• 420k+ lines authored since 2006

– More than twice nearest contributor

• Deeply integrating w/ecosystem

– Enabling new deployment platforms – (ex. Windows & Azure, Linux & VMware HA)

– Creating deeply engineered solutions – (ex. Teradata big data appliance)

• All Apache, NO holdbacks

– 100% of code contributed to Apache

© Hortonworks Inc. 2013 © Hortonworks Inc. 2013

Agenda

• The Growth of Enterprise Data

• Hadoop Market Drivers

• Hortonworks – an Overview

• The Future of Hadoop and Big Data

Page 23

© Hortonworks Inc. 2013

The 1st Generation of Hadoop: Batch

HADOOP 1.0 Built for Web-Scale Batch Apps

Single App

BATCH

HDFS

Single App

INTERACTIVE

Single App

BATCH

HDFS

• All other usage

patterns must

leverage that same

infrastructure

• Forces the creation

of silos for managing

mixed workloads

Single App

BATCH

HDFS

Single App

ONLINE

© Hortonworks Inc. 2013 © Hortonworks Inc. 2013

The Enterprise Requirement: Beyond Batch

To become an enterprise viable data platform, customers have

told us they want to store ALL DATA in one place and interact with

it in MULTIPLE WAYS

– Simultaneously & with predictable levels of service

Page 25

HDFS (Redundant, Reliable Storage)

BATCH INTERACTIVE STREAMING GRAPH IN-MEMORY HPC MPI ONLINE SEARCH

© Hortonworks Inc. 2013 © Hortonworks Inc. 2013

YARN: Taking Hadoop Beyond Batch

• Created to manage resource needs across all uses

• Ensures predictable performance & QoS for all apps

• Enables apps to run “IN” Hadoop rather than “ON”

– Key to leveraging all other common services of the Hadoop platform:

security, data lifecycle management, etc.

Page 26

Applications Run Natively IN Hadoop

HDFS2 (Redundant, Reliable Storage)

YARN (Cluster Resource Management)

BATCH (MapReduce)

INTERACTIVE (Tez)

STREAMING (Storm, S4,…)

GRAPH (Giraph)

IN-MEMORY (Spark)

HPC MPI (OpenMPI)

ONLINE (HBase)

OTHER (Search)

(Weave…)

© Hortonworks Inc. 2013 © Hortonworks Inc. 2013

The Future of the Hadoop and Big Data

• The next generation data architecture evolving rapidly

–Store ALL data in a Hadoop data reservoir

–Push subsets of data to a final platform for processing

• Hadoop 2.0 takes Hadoop beyond “Batch”

–2.0 YARN based architecture enabling mixed use workloads with

enterprise resource management

• Enabling a new generation of applications at scale

–Based on new data types (sensor, sentiment, clickstream, etc.) or

keeping existing types for much longer

© Hortonworks Inc. 2013

Hortonworks Sandbox

Page 28

Hands on tutorials

integrated into

Sandbox

HDP environment for

evaluation

© Hortonworks Inc. 2013 Page 29

THANK YOU! Chris Harris

[email protected]

Download Sandbox

hortonworks.com/sandbox