cloud and big data - dell emc · pdf filegemfire . vpostgres . 5 cloud stack – neutral...

© 2009 VMware Inc. All rights reserved

Cloud + Big Data – Putting it all Together Even Solberg

3

Cloud Delivery Model Data as a service for private and public clouds

Big, Fast and Flexible Data

Flexible

OSS Relational

Document

Object

Key / Value

Fast OLTP

workloads

Analytic workloads

Big Big Data

Processing

Big Data Analytics

4


Flexible Big Big Data

Processing

Big Data Analytics

Serengeti

Fast OLTP

workloads

Analytic workloads


OSS Relational

Document

Object

Key / Value

GemFire

vPostgres

5

Cloud Stack – Neutral View

SaaS

PaaS

IaaS

6

Big Data IaaS

7

…but first, some Background. How to build an IaaS Cloud

Central Infrastructure Management

Automated ITIL Process Including Approvals

Automated Provisioning Multi Tenancy IT Service Catalog Resource Distribution Resource Allocation

Customer A

Customer B

Customer C

Customer D

Service Catalog

Service Catalog

Service Catalog

Service Catalog

Users Groups

Users Groups

Users Groups

Users Groups

Administrative Interface / Resource Allocation and Definition

Out Of The Box Integration

Human Interaction

Integration must be built

Generate Ticket 1:st Line Support

Performance Mgmt Resource Mgmt Capacity Mgmt Compliance Mgmt

Cost Models Usage Allocation Pay As You Go CB / SB

Exported Billing Information

Ser

vice

Del

iver

y M

anag

emen

t

https://customer.portal.org Service Catalog Workflow engine SLA Descriptions Show back Billing Information Customer Portal

Service Owner

Network & Security Firewall VPN Load Balancer NAT

ITSM Ticketing Change Mgmt Support

Change & Release

Mgmt

Service Renewal

Gold Silver

Bronze

Cust C Cust B Cust A

Automated ITIL Process Including Approvals

Automated Provisioning Multi Tenancy IT Service Catalog Resource Distribution Resource Allocation

Customer A

Customer B

Customer C

Customer D

Service Catalog

Service Catalog

Service Catalog

Service Catalog

Users Groups

Users Groups

Users Groups

Users Groups

Administrative Interface / Resource Allocation and Definition

Out Of The Box Integration

Human Interaction

Integration must be built

Generate Ticket 1:st Line Support

Gold Silver

Bronze

Cust C Cust B Cust A

Performance Mgmt Resource Mgmt Capacity Mgmt Compliance Mgmt

Cost Models Usage Allocation Pay As You Go CB / SB

Exported Billing Information

Ser

vice

Del

iver

y M

anag

emen

t

https://customer.portal.org Service Catalog Workflow engine SLA Descriptions Show back Billing Information Customer Portal

Service Owner

Network & Security Firewall VPN Load Balancer NAT

ITSM Ticketing Change Mgmt Support

Change & Release

Mgmt

Service Renewal

vCNS

vCenter Operations

Suite

Service Manager

-- DynamicOps

Service Manager Application

Director

Ser

vice

Man

ager

/ IT

BM

Service Manager

vSphere

vCenter Chargeback

vCloud Director

Organization: Finance Organization: Marketing

Org VDC Catalogs Org VDC Catalogs

VMware vSphere

VMware vCenter Server

Resource Pools Datastores Port Groups

Provider Virtual Datacenters

Gold Bronze Silver

Users & Policies Users & Policies

11

Virtualization

Complete Cloud Suite

(server, storage, network)

vSphere

Management

vFabric Application

Director

vCenter Operations Mgmt Suite

vCenter Site

Recovery Manager

Cloud Infrastructure

vCloud Director

Software Defined Storage

Software Defined

Networking

Software Defined

Availability

Software Defined Security

Extensibility

vCloud APIs

vCloud Connector

vCenter

Orchestrator

12

Virtualizing Hadoop Project Serengeti

13

3 Big Reasons to Virtualize Hadoop

14

SQL

Hadoop

1. Virtualize Hardware

DSS

NoSQL Unified Big Data Infrastructure

Private

Public

Big SQL Hadoop NoSQL

15

2. Rapid Provisioning

I want my Hadoop cluster NOW!

16

3. Leverage Capabilities

Increase Utilization

No single points of failure

VM Isolation

Resource Management

17

What? Hadoop in a VM? Really?

Actually, Hadoop performs well in a virtual machine

18

Performance of Hadoop for Several Workloads

0

0,2

0,4

0,6

0,8

1

1,2

Rat

io to

Nat

ive

1 VM 2 VMs

Ratio of time taken – Lower is Better

19

Fast Provisioning

From a “seed” node to a cluster

Thin Provisioning Linked Clone

60GB => 3.5GB ~6 second

20

Being Efficient through Resource over-commitment

Memory over-commitment • Hadoop JVMs hold onto memory

even when not busy

• vSphere memory overcommit allows us to pack more hadoop nodes per host

• If you use EM4J, this can be optimized further

Disk over-commitment • Hadoop is designed for large

dataset

• Thin-provisioning is wonderful in saving disk footprint

21

Performance

Create more smaller VMs • Makes Hadoop scale better

• Single large Hadoop node is limited by JVM scalability

• Allows for easier/faster adjustment of packing of VMs across hosts by vSphere (through DRS)

Sizing/Configuration of storage is critical • Plan on ~50Mbytes/sec of bandwidth per core

• SAN ports/switches will limit performance

• SANs are typically configured by default for IOPS, not Bandwidth

• Performance of the backend storage should be tested/sized

• Local disks will give ~100MBytes/sec per disk: pick correct controller

22

Summary

Hadoop does work well in a virtual environment Plan a virtual cluster, enable other big-data solutions on the same

infrastructure Leverage the recipes to automate your configuration and

deployment

The big glaring hole [with cloud] is data handling.

-Adrian Kunzle, MD Head of Engineering & Architecture, JPMorgan Chase

“ ”

24

New Ways to Work with Data

NoSQL • In-memory • Key/value pairs,

simplicity, high productivity

• Different offerings, different data models: document, graph, big table, column

NewSQL • In-memory • Scalability benefits

of in-memory systems with standardized SQL

Classic SQL • Traditional RDBMS • ACID (atomicity,

consistency, isolation, durability)

25

How do you scale the data tier?

26

vFabric GemFire

Application Data Lives Here

Application Data Sleeps Here

27

Key Capabilities

Low-latency, linearly-scalable, memory-based data fabric

Data-aware execution

Active/continuous querying and event notification

28

Primary Use Cases

Web session cache, L2 cache

App data cache, in-memory DB

Grid data fabric: client compute

Grid data fabric: fabric compute

vFabric Data Director

Provisioning Backup / Restore Clone One click

HA DBA App Dev

Automation Self-Service

Resource Mgmt

Security Mgmt

Database Templates Monitor

IT Admin

Policy Based Control

DBA

Existing Applications New Applications

Public Cloud Private Cloud Hybrid Cloud

30

Big Data PaaS Cloud Foundry & vFabric

31

Cloud Stack – Neutral View

SaaS

PaaS

IaaS

32

Cloud Stack – Classic Pyramid

SaaS PaaS IaaS

33

Cloud Stack – By Numbers

SaaS PaaS IaaS

34

Cloud Stack – By Value

SaaS

IaaS

PaaS

35

Big Data PaaS Architecture

Infrastructure as a Service (IaaS)

Coordination

Data Integration Data

Process U-Data Store

Graph Store

Read / Write

Access

Languages

Workflow Scheduling Metadata

Analytics

UI Framework Other Application Services

Big Data API

Business Intelligence Applications

App

licat

ion

Life

cycl

e M

anag

emen

t

Sec

urity

Sys

tem

s M

onito

ring

& M

anag

emen

t

37

OSS community

38

Data Services

Other Services

Msg Services

vFabric Postgres

vFabric RabbitMQTM

Additional partners services …

39

Data Services

Other Services

Msg Services

Private Clouds

Public Clouds

Micro Clouds

.COM

Partners

40

VMware Cloud Application Platform

Virtual Datacenter Cloud Infrastructure and Management

Rich Web

Programming Model

Social and Mobile

Data Access

Integration Patterns

Batch Framework

WaveMaker Spring Tool Suite

Cloud Foundry

App Monitoring (Spring Insight)

Performance Mgmt (Hyperic)

Automated App Provisioning

(AppDirector)

Java Optimizations

(EM4J, …)

Java Runtime (tc Server)

Web Runtime (ERS)

Messaging (RabbitMQ)

Global Data (GemFire)

In-mem SQL (SQLFire)

41

Big Data SaaS Cetas

42

Data Sources

43

On-Premise Installation

44

Cloud-based Installation

45

Summary

46


Flexible Big Big Data

Processing

Big Data Analytics

Serengeti

Fast OLTP

workloads

Analytic workloads


OSS Relational

Document

Object

Key / Value

GemFire

vPostgres

cloud and big data - dell emc · pdf filegemfire . vpostgres . 5 cloud stack – neutral...

Documents