cloud and big data - dell emc · pdf filegemfire . vpostgres . 5 cloud stack – neutral...
TRANSCRIPT
© 2009 VMware Inc. All rights reserved
Cloud + Big Data – Putting it all Together Even Solberg
2
3
Cloud Delivery Model Data as a service for private and public clouds
Big, Fast and Flexible Data
Flexible
OSS Relational
Document
Object
Key / Value
Fast OLTP
workloads
Analytic workloads
Big Big Data
Processing
Big Data Analytics
4
Big, Fast and Flexible Data
Flexible Big Big Data
Processing
Big Data Analytics
Serengeti
Fast OLTP
workloads
Analytic workloads
Cloud Delivery Model Data as a service for private and public clouds
OSS Relational
Document
Object
Key / Value
GemFire
vPostgres
5
Cloud Stack – Neutral View
SaaS
PaaS
IaaS
6
Big Data IaaS
7
…but first, some Background. How to build an IaaS Cloud
Central Infrastructure Management
Automated ITIL Process Including Approvals
Automated Provisioning Multi Tenancy IT Service Catalog Resource Distribution Resource Allocation
Customer A
Customer B
Customer C
Customer D
Service Catalog
Service Catalog
Service Catalog
Service Catalog
Users Groups
Users Groups
Users Groups
Users Groups
Administrative Interface / Resource Allocation and Definition
Out Of The Box Integration
Human Interaction
Integration must be built
Generate Ticket 1:st Line Support
Performance Mgmt Resource Mgmt Capacity Mgmt Compliance Mgmt
Cost Models Usage Allocation Pay As You Go CB / SB
Exported Billing Information
Ser
vice
Del
iver
y M
anag
emen
t
https://customer.portal.org Service Catalog Workflow engine SLA Descriptions Show back Billing Information Customer Portal
Service Owner
Network & Security Firewall VPN Load Balancer NAT
ITSM Ticketing Change Mgmt Support
Change & Release
Mgmt
Service Renewal
Gold Silver
Bronze
Cust C Cust B Cust A
Automated ITIL Process Including Approvals
Automated Provisioning Multi Tenancy IT Service Catalog Resource Distribution Resource Allocation
Customer A
Customer B
Customer C
Customer D
Service Catalog
Service Catalog
Service Catalog
Service Catalog
Users Groups
Users Groups
Users Groups
Users Groups
Administrative Interface / Resource Allocation and Definition
Out Of The Box Integration
Human Interaction
Integration must be built
Generate Ticket 1:st Line Support
Gold Silver
Bronze
Cust C Cust B Cust A
Performance Mgmt Resource Mgmt Capacity Mgmt Compliance Mgmt
Cost Models Usage Allocation Pay As You Go CB / SB
Exported Billing Information
Ser
vice
Del
iver
y M
anag
emen
t
https://customer.portal.org Service Catalog Workflow engine SLA Descriptions Show back Billing Information Customer Portal
Service Owner
Network & Security Firewall VPN Load Balancer NAT
ITSM Ticketing Change Mgmt Support
Change & Release
Mgmt
Service Renewal
vCNS
vCenter Operations
Suite
Service Manager
-- DynamicOps
Service Manager Application
Director
Ser
vice
Man
ager
/ IT
BM
Service Manager
vSphere
vCenter Chargeback
vCloud Director
Organization: Finance Organization: Marketing
Org VDC Catalogs Org VDC Catalogs
VMware vSphere
VMware vCenter Server
Resource Pools Datastores Port Groups
Provider Virtual Datacenters
Gold Bronze Silver
Users & Policies Users & Policies
11
Virtualization
Complete Cloud Suite
(server, storage, network)
vSphere
Management
vFabric Application
Director
vCenter Operations Mgmt Suite
vCenter Site
Recovery Manager
Cloud Infrastructure
vCloud Director
Software Defined Storage
Software Defined
Networking
Software Defined
Availability
Software Defined Security
Extensibility
vCloud APIs
vCloud Connector
vCenter
Orchestrator
12
Virtualizing Hadoop Project Serengeti
13
3 Big Reasons to Virtualize Hadoop
14
SQL
Hadoop
1. Virtualize Hardware
DSS
NoSQL Unified Big Data Infrastructure
Private
Public
Big SQL Hadoop NoSQL
15
2. Rapid Provisioning
I want my Hadoop cluster NOW!
16
3. Leverage Capabilities
Increase Utilization
No single points of failure
VM Isolation
Resource Management
17
What? Hadoop in a VM? Really?
Actually, Hadoop performs well in a virtual machine
18
Performance of Hadoop for Several Workloads
0
0,2
0,4
0,6
0,8
1
1,2
Rat
io to
Nat
ive
1 VM 2 VMs
Ratio of time taken – Lower is Better
19
Fast Provisioning
From a “seed” node to a cluster
Thin Provisioning Linked Clone
60GB => 3.5GB ~6 second
20
Being Efficient through Resource over-commitment
Memory over-commitment • Hadoop JVMs hold onto memory
even when not busy
• vSphere memory overcommit allows us to pack more hadoop nodes per host
• If you use EM4J, this can be optimized further
Disk over-commitment • Hadoop is designed for large
dataset
• Thin-provisioning is wonderful in saving disk footprint
21
Performance
Create more smaller VMs • Makes Hadoop scale better
• Single large Hadoop node is limited by JVM scalability
• Allows for easier/faster adjustment of packing of VMs across hosts by vSphere (through DRS)
Sizing/Configuration of storage is critical • Plan on ~50Mbytes/sec of bandwidth per core
• SAN ports/switches will limit performance
• SANs are typically configured by default for IOPS, not Bandwidth
• Performance of the backend storage should be tested/sized
• Local disks will give ~100MBytes/sec per disk: pick correct controller
22
Summary
Hadoop does work well in a virtual environment Plan a virtual cluster, enable other big-data solutions on the same
infrastructure Leverage the recipes to automate your configuration and
deployment
The big glaring hole [with cloud] is data handling.
-Adrian Kunzle, MD Head of Engineering & Architecture, JPMorgan Chase
“ ”
24
New Ways to Work with Data
NoSQL • In-memory • Key/value pairs,
simplicity, high productivity
• Different offerings, different data models: document, graph, big table, column
NewSQL • In-memory • Scalability benefits
of in-memory systems with standardized SQL
Classic SQL • Traditional RDBMS • ACID (atomicity,
consistency, isolation, durability)
25
How do you scale the data tier?
26
vFabric GemFire
Application Data Lives Here
Application Data Sleeps Here
27
Key Capabilities
Low-latency, linearly-scalable, memory-based data fabric
Data-aware execution
Active/continuous querying and event notification
28
Primary Use Cases
Web session cache, L2 cache
App data cache, in-memory DB
Grid data fabric: client compute
Grid data fabric: fabric compute
vFabric Data Director
Provisioning Backup / Restore Clone One click
HA DBA App Dev
Automation Self-Service
Resource Mgmt
Security Mgmt
Database Templates Monitor
IT Admin
Policy Based Control
DBA
Existing Applications New Applications
Public Cloud Private Cloud Hybrid Cloud
30
Big Data PaaS Cloud Foundry & vFabric
31
Cloud Stack – Neutral View
SaaS
PaaS
IaaS
32
Cloud Stack – Classic Pyramid
SaaS PaaS IaaS
33
Cloud Stack – By Numbers
SaaS PaaS IaaS
34
Cloud Stack – By Value
SaaS
IaaS
PaaS
35
Big Data PaaS Architecture
Infrastructure as a Service (IaaS)
Coordination
Data Integration Data
Process U-Data Store
Graph Store
Read / Write
Access
Languages
Workflow Scheduling Metadata
Analytics
UI Framework Other Application Services
Big Data API
Business Intelligence Applications
App
licat
ion
Life
cycl
e M
anag
emen
t
Sec
urity
Sys
tem
s M
onito
ring
& M
anag
emen
t
37
OSS community
38
Data Services
Other Services
Msg Services
vFabric Postgres
vFabric RabbitMQTM
Additional partners services …
39
Data Services
Other Services
Msg Services
Private Clouds
Public Clouds
Micro Clouds
.COM
Partners
40
VMware Cloud Application Platform
Virtual Datacenter Cloud Infrastructure and Management
Rich Web
Programming Model
Social and Mobile
Data Access
Integration Patterns
Batch Framework
WaveMaker Spring Tool Suite
Cloud Foundry
App Monitoring (Spring Insight)
Performance Mgmt (Hyperic)
Automated App Provisioning
(AppDirector)
Java Optimizations
(EM4J, …)
Java Runtime (tc Server)
Web Runtime (ERS)
Messaging (RabbitMQ)
Global Data (GemFire)
In-mem SQL (SQLFire)
41
Big Data SaaS Cetas
42
Data Sources
43
On-Premise Installation
44
Cloud-based Installation
45
Summary
46
Big, Fast and Flexible Data
Flexible Big Big Data
Processing
Big Data Analytics
Serengeti
Fast OLTP
workloads
Analytic workloads
Cloud Delivery Model Data as a service for private and public clouds
OSS Relational
Document
Object
Key / Value
GemFire
vPostgres
47