emc hadoop storage strategy

16
1 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved. EMC Hadoop Storage Strategy Ed Walsh - @vEddieW Jim Ruddy - @Darth_Ruddy Dan Baskette - @dbbaskette

Upload: walshe1

Post on 04-Dec-2014

922 views

Category:

Data & Analytics


0 download

DESCRIPTION

EMC HADOOP Storage Strategy presented at EMCWorld2014

TRANSCRIPT

Page 1: EMC HADOOP Storage Strategy

1© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

EMC Hadoop Storage Strategy

Ed Walsh - @vEddieWJim Ruddy - @Darth_RuddyDan Baskette - @dbbaskette

Page 2: EMC HADOOP Storage Strategy

2© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

CHANGES IN ANALYTICS

DATAVOLUME

DATAVELOCITY

DATATYPES APPS

Page 3: EMC HADOOP Storage Strategy

3© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

DATA LAKE TECHNOLOGY

HADOOP DISTRIBUTED FILE SYSTEM• Highly saleable & portable

Apache Open Source Specification

• Structured and unstructured data

• Analytics API interface standard

• Storage hardware flexibility

• Performance optimized for large file access

HDFS TRADE-OFFS• Optimized for streaming writes; poor for random seeks

• Write once file system

• Hardware failure results in reduced performance

• Specialized file system, not designed for general use

HDFS Architecture

Client

NameNode

SecondaryNameNode

(Now called checkpoint or backup node)

Where do I read or

write data?

Justthesenodes

DataNode

DataNode

DataNode

Data

Status

HDFS Data

Page 4: EMC HADOOP Storage Strategy

4© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

DATA LAKE TECHNOLOGY

HADOOP TIER

DataNode

HDFS

DataNode

HDFS

DataNode

HDFS

DataNode

HDFS

DataNode

HDFS

DataNode

HDFS

DataNode

HDFS

PROCESSING TIER – ME, HIVE, ETC.

DEEP SCALE SQL ANALYTICS – PIVOTAL HAWQ

IN MEMORY TIER

SQL OBJECTS JSON

DATABASES

Operational data is the focus (it is in memory, mostly)

Continue to work with RDBs

All data, history in HDFS

HDFS data files directly accessible inside Hadoop

Analytic results routed to memory tier

Page 5: EMC HADOOP Storage Strategy

5© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

DATA LAKE STORAGE FEATURES

NO SILOS

Multi-protocol accessSimultaneous access for

unstructured dataSeparation of storage from access protocol

OPTMIZED COST

Choice of storage hardwareMulti-vendor, no lock-in

LIMITLESS SCALE

Expand capacity as neededMassive scale-outHighly available

Page 6: EMC HADOOP Storage Strategy

6© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

INTEGRATED HDFS WITH HADOOP DISTRO

STRENGTHS• Tightly coupled with

Hadoop software

• Low cost

• Storage hardware choice

• Integrated software support

• Data locality

Page 7: EMC HADOOP Storage Strategy

7© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

HDFS STORAGE ARRAY INTERFACE

STRENGTHS• No Ingest necessary

• NameNode Fault Tolerance

• Eliminate 3x mirroring

• Multi-protocol access

• Simultaneous Multi-Hadoop distribution support

• Smart-Dedupe for Hadoop

• SEC 17a-4 Compliance

• Kerberos Authentication

• Application Multi-tenancy

EMC Hadoop Starter Kit: https://community.emc.com/docs/DOC-26892

Page 8: EMC HADOOP Storage Strategy

8© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

HDFS BY STORAGE VIRTUALIZATION SOFTWARESTRENGTHS• Multi-protocol access

Object, HDFS, Block (iSCSI), more coming

Write file, read object & vice versa

• NameNode Fault Tolerance

• Eliminate 3x mirroring

• Compute & data locality

• Application multi-tenancy

• Heterogeneous Storage: Pool server storage Enterprise arrays

• EMC, Netapp, Hitachi

EMC Hadoop Starter Kit: https://community.emc.com/docs/DOC-34442

Page 9: EMC HADOOP Storage Strategy

9© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

ANALYTICS APPLIANCES

STRENGTHS• Rapid deployment

• Predictable performance & scale

• Optimum resource utilization

• Integrated, simplified management

• Simplified support & maintenance

• Optimized cost

• Highest Reliability, Availability, and Stability

Page 10: EMC HADOOP Storage Strategy

10© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

Traditional Analytics Architecture

RMT

Historian

IMAS

Alarm

LIMS

Oracle

BI(SSRS, Panopticon,

Web)

Analytics Server(SAS)

Analytics Server

(R)

Pre-aggregated Tables

BI(Cognos)

Page 11: EMC HADOOP Storage Strategy

11© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

Modern Analytics ArchitectureEMC Data Lake Architecture

RMT

Historian

IMAS

Alarm

LIMS

BIServer

(SSRS, Panopticon, Web)

Analytics Server(SAS)

Analytics Server

(R)

Historian

Alpine/Chorus(Pivotal)

“Real

Time”

Feed

BIServer

(Tableau or other)

Reporting DB

GemFire XD HAWQ

HDFS

Page 12: EMC HADOOP Storage Strategy

12© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

MODERN ANALYTICS USING DATA LAKE

DEMO

Page 13: EMC HADOOP Storage Strategy

13© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

EMC DATA LAKE CAPABILITIEs

Documents (XLS, PPT, DOC)

SQLDatabases

Rich Media (PDF, JPG, Video,

Streaming)

Sensor Data (GPS coordinates,

temperature measurements)

Unstructured Context (Web Server Logs,

Scale Effortlessly | Store Efficiently | Access Globally

Page 14: EMC HADOOP Storage Strategy

Ed Walsh - @vEddieWJim Ruddy - @Darth_RuddyDan Baskette - @dbbaskette

Page 15: EMC HADOOP Storage Strategy

16© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

The Emerging Data Platform EcosystemBusiness Data Lake

Ingestion Tier

Real-time

Batch

Micro batch

Data Sources

ClickstreamSensorsTelemetrics

WeblogsNetwork Data

CRMERP DataCollab}

Insights Tier

SQL

MapReduce

NoSQL

Spark

R

Action Tier

Real-timeInsights

BatchInsights

InteractiveInsights

Operations Tier

Data Services Tier

Processing Tier

MDMRDM

Audit and Policy mgmt

Data mgmt services

Systems monitoring and management

Relational Database

MPP Database

In-memory processing

Workflow Management

Hadoop App ServerWeb

Services

Data Management Tier

HDFSSoftware-

defined StorageEnterprise SAN/NAS

Public Cloud Hybrid Cloud Private CloudInfrastructure Tier

Page 16: EMC HADOOP Storage Strategy

17© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

Business Data LakeEMC Federation Solutions

Data Sources Ingestion Tier

ClickstreamSensorsTelemetrics

Real-time

WeblogsNetwork Data

BatchCRMERP DataCollab

Micro batch

}

Operations Tier

Data Services Tier

Processing Tier

MDMRDM

Audit and Policy mgmt

Data mgmt services

Systems monitoring and management

Relational Database

MPP Database

In-memory processing

Data Management Tier

Workflow Management

Insights Tier

SQL

MapReduce

NoSQL

Spark

R

Action Tier

Real-timeInsights

BatchInsights

InteractiveInsights

Hadoop App ServerWeb

Services

HDFSSoftware-

defined StorageEnterprise SAN/NAS

Public Cloud Hybrid Cloud Private CloudInfrastructure Tier

VMware vCloud Suite vCloud Hybrid Services