demystify big data breakfast briefing: herb cunitz, hortonworks

20
© Hortonworks Inc. 2013. Confidential and Proprietary. Hadoop in London July 9, 2013 Herb Cunitz Hortonworks President @hcunitz Page 1

Upload: hortonworks

Post on 26-Jan-2015

105 views

Category:

Technology


2 download

DESCRIPTION

Demystify Big Data Breakfast Briefing, 9th July London - Herb Cunitz Hortonworks

TRANSCRIPT

Page 1: Demystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks

© Hortonworks Inc. 2013. Confidential and Proprietary.

Hadoop in LondonJuly 9, 2013

Herb Cunitz

Hortonworks President

@hcunitz

Page 1

Page 2: Demystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks

© Hortonworks Inc. 2013. Confidential and Proprietary.

Why is Hadoop Important?

We Believe that More than Half the World's Data Will Be Processed by Apache Hadoop.

Page 3: Demystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks

By 2015, Organizations that Build a Modern Information Management System Will

Outperform their Peers Financially by 20 Percent.

– Gartner, Mark Beyer, “Information Management in the 21st Century”

Page 4: Demystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks

© Hortonworks Inc. 2013. Confidential and Proprietary.

New Sources (sentiment, clickstream, geo, sensor, …)

Traditional Data ArchitectureAP

PLIC

ATIO

NS

DATA

SYS

TEM

S

TRADITIONAL REPOS

RDBMS EDW MPP

DATA

SO

URC

ES

OLTP, POS SYSTEMS

Business Analytics

Custom Applications

PackagedApplications

Pressured

TRADITIONAL REPOS

RDBMS EDW MPP

OPERATIONALTOOLS

MANAGE & MONITOR

DEV & DATATOOLS

BUILD & TEST

Traditional Sources (RDBMS, OLTP, OLAP)

Page 5: Demystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks

© Hortonworks Inc. 2013. Confidential and Proprietary.

PressuredTraditional Data Architecture

Source: IDC

New Sources (sentiment, clickstream, geo, sensor, …)

2.8 ZB in 2012

85% from New Data Types

15x Machine Data by 2020

40 ZB by 2020

Page 6: Demystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks

© Hortonworks Inc. 2013. Confidential and Proprietary.

New Sources (sentiment, clickstream, geo, sensor, …)

Modern Data Architecture EnabledAP

PLIC

ATIO

NS

DATA

SYS

TEM

SDA

TA S

OU

RCES

OLTP, POS SYSTEMS

Business Analytics

Custom Applications

PackagedApplications

TRADITIONAL REPOS

RDBMS EDW MPP

Traditional Sources (RDBMS, OLTP, OLAP)

MANAGE & MONITOR

OPERATIONALTOOLS

BUILD & TEST

DEV & DATATOOLS

ENTERPRISE HADOOP PLATFORM

Page 7: Demystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks

© Hortonworks Inc. 2013. Confidential and Proprietary.

Agile “Data Lake” Solution Architecture

Capture All Data Process & Structure1 2 Distribute Results3 Feedback & Retain4

Dashboards, Reports, Visualization, …

Web, Mobile, CRM, ERP,Point of sale

Business Transactions& Interactions

Business Intelligence & Analytics

Classic Data Integration & ETL

Logs & Text Data

Sentiment Data

Structured DB Data

Clickstream Data

Geo & Tracking Data

Sensor & Machine Data

Enterprise Hadoop Platform

Page 8: Demystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks

© Hortonworks Inc. 2013. Confidential and Proprietary.

BATCH INTERACTIVE STREAMING GRAPH IN-MEMORY HPC MPIONLINE OTHER…

Key Requirement of a “Data Lake”

Store ALL DATA in one place…

…and Interact with that data in MULTIPLE WAYS

HDFS (Redundant, Reliable Storage)

Page 9: Demystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks

© Hortonworks Inc. 2013. Confidential and Proprietary.

Applications Run Natively IN Hadoop

BATCHMapReduce

INTERACTIVETez

STREAMINGStorm

GRAPHGiraph

IN-MEMORYSpark

HPC MPIOpenMPI

ONLINEHBase

OTHER…ex. Search

YARN Takes Hadoop Beyond Batch

Applications run “IN” Hadoop versus “ON” Hadoop…

…with Predictable Performance and Quality of Service

HDFS2 (Redundant, Reliable Storage)

YARN (Cluster Resource Management)

Page 10: Demystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks

© Hortonworks Inc. 2013. Confidential and Proprietary.

2.0 Architected for theBroad Enterprise

Hadoop 2.0 Key Highlights

Rolling Upgrades

Disaster Recovery

Snapshots

Full Stack HA

Hive on Tez

YARN

HDP 2.0 Features

Single Cluster,Many Workloads

BATCH

INTERACTIVE

ONLINE

STREAMING

ZERO downtime

Multi Data Center

Point in time Recovery

Reliability

Interactive Query

Mixed workloads

Enterprise Requirements

Page 11: Demystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks

© Hortonworks Inc. 2013. Confidential and Proprietary.

Making Hadoop Enterprise Ready

OS/VM Cloud Appliance

Enterprise Hadoop Platform

PLATFORM SERVICES

Enterprise ReadinessHigh Availability, Disaster Recovery,Security and Snapshots

OPERATIONAL SERVICES

Manage & Operate at Scale

DATASERVICES

Store, Process and Access Data

COREDistributed Storage & Processing

Page 12: Demystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks

© Hortonworks Inc. 2013. Confidential and Proprietary.

SQL-IN-Hadoop with Apache Hive

Stinger Initiative Focus Areas

Make Hive 100X Faster

Make Hive SQL Compliant HDFS2

YARN

HIVE

SQL

MAPREDUCE

Business Analytics

CustomApps

TEZ

Page 13: Demystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks

© Hortonworks Inc. 2013. Confidential and Proprietary.

Falcon: One-stop Shop for Data Lifecycle

Falcon: Data Lifecycle Management Framework

Data Import and Replication

Scheduling and

Coordination

Data Lifecycle Policies

Multi-Cluster Management

SLA Management

Page 14: Demystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks

© Hortonworks Inc. 2013. Confidential and Proprietary.

Knox: Simplify Hadoop User Access

Hadoop Cluster

Authentication & Verification

Client

User StoreKDC, AD,

LDAP

{REST}Knox

gatewaycluster

Simplify Security

For both users andoperators

Aggregate Access

Deliver unified access for A ‘single application’ feel

Client Agility

Abstract users fromlocation of services

Page 15: Demystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks

© Hortonworks Inc. 2013. Confidential and Proprietary.

Innovate

Participate

Integrate

Many Communities Must Work As One

Open Source

End Users

Vendors

Page 16: Demystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks

© Hortonworks Inc. 2013. Confidential and Proprietary.

Ecosystem Completes the Puzzle

Data Systems

Applications, Business Tools, & Dev Tools

Infrastructure & Systems Management

Page 17: Demystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks

© Hortonworks Inc. 2013. Confidential and Proprietary.

Hadoop Wave ONE: Web-scale Batch Apps

time

rela

tive

% c

ust

om

ers

Customers want solutions & convenience

Customers want technology & performance

Source: Geoffrey Moore - Crossing the Chasm

2006 to 2012Web-Scale

Batch Applications

Innovators, technology enthusiasts

Early adopters,

visionaries

Early majority,

pragmatists

Latemajority,

conservatives

Laggards, Skeptics

Th

e C

HA

SM

Page 18: Demystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks

© Hortonworks Inc. 2013. Confidential and Proprietary.

Customers want solutions & convenience

Customers want technology & performance

Hadoop Wave TWO: Broad Enterprise Apps

time

rela

tive

% c

ust

om

ers

Source: Geoffrey Moore - Crossing the Chasm

Innovators, technology enthusiasts

Early adopters,

visionaries

Early majority,

pragmatists

Latemajority,

conservatives

Laggards, Skeptics

Th

e C

HA

SM

2013 & BeyondBatch, Interactive, Online, Streaming, etc., etc.

Page 19: Demystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks

© Hortonworks Inc. 2013. Confidential and Proprietary.

Hortonworks – We Do Hadoop

Open Source Community

PartnerEcosystem

Commercial Adoption

Page 20: Demystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks

© Hortonworks Inc. 2013

Thank You

Page 20