virtualizing hadoop

21
VIRTUALIZING HADOOP ROMMEL GARCIA

Upload: rommel-garcia

Post on 15-Jan-2017

331 views

Category:

Software


0 download

TRANSCRIPT

VIRTUALIZING HADOOPROMMEL GARCIA

HADOOP USAGE

3

40% 28%

39% 51%

21% 21%

Today In 2 Years

On public cloud infrastructure such as AWS or Google

Virtualized servers in your data center

Unvirtualized servers in your data center

Off-premise pCAGR: 1%

On-premise, Virtualized

pCAGR: 14%

On-premise, Unvirtualized qCAGR: -16%

26%

21%

8%

30%

9%

5%

2%

0%

Currently use

Actively evaluating

Have evaluated but decided not to use

May consider it in the future

No interest whatsoever

Never heard of it

Don't Know

Other

Source: Internal VMware Core Metrics Study, July 2015

COMMODITY VS. APPLIANCEVIRTUALIZATION HARDWARE

VIRTUALIZATION PLATFORM

SCENARIO 1

▸ SAN Storage (LUN)

▸ Generic Blade Servers for Compute

▸ 1/10Gbe Network

▸ vm sizes are typically small

▸ 4 vCPU

▸ 32GB vRAM

VIRTUALIZATION PLATFORM

SCENARIO 2

▸ Storage Appliance for Hadoop

▸ EMC Isilon

▸ NetApp Open Solution

▸ Purpose-built Virtualization Blade Servers for Compute

▸ Fabric Interconnect/Infiniband

▸ vm sizes are typically bigger

▸ up to 16 vCPU

▸ up to 120GB vRAM

VIRTUALIZATION PLATFORM

SCENARIO 3

▸ Local Storage for Hadoop

▸ Rack Mounted Servers

▸ 1/10Gbe Network

▸ vm sizes are typically bigger

▸ up to 16 vCPU

▸ up to 120GB vRAM

VIRTUALIZATION PLATFORM OF CHOICE

COMMON CHOICE

▸ VMWare vSphere

▸ ahead of the curve, a lot more mature

▸ BDE provisions Hadoop

▸ OpenStack

▸ new, only open source choice which provides a lot of promise

CAN WE USE IT FOR POC, DEV, UAT, PROD???

THE ANSWER IS YES.

REAL-WORLD SETUPVIRTUALIZATION ARCHITECTURE

QUICK REVIEW ON HADOOP ARCHITECTURE

HADOOP ARCHITECTURE

WorkerNode1 WorkerNode2 WorkerNode3

InputFile Resourcemanager Job

Datanode

NodemanagerSplit1–64MB

AppMaster-1

Split2–64MB Split3–64MB

Nodemanager Nodemanager

Datanode Datanode

Block1–64MB Block2–64MB Block3–64MB

Container-2 Container-3

Namenode

Master Roles

Image credit: VMware

VIRTUALIZATION ARCHITECTURE

HADOOP WITH ISILON

Shared storage/NAS

Hadoop Virtual Node 2

NN

NN

NN

NN

NN

NN datanode

Isilon

Virtualization Host

VMDK OS Image – VMDK OS Image – VMDK VMDK

VMDK

Hadoop Virtual Node 1

Ext4

Resourcemanager

Ext4

Temp OS Image –

VMDK

Ext4

Nodemanager

Ext4 Hadoop Virtual Node 3

Ext4

Nodemanager

Ext4

Temp

Image credit: VMware

VIRTUALIZATION ARCHITECTURE

DAS WITH HADOOP

Virtualization Host Server

VMDK

Hadoop Node 1 Virtual Machine

Datanode

Ext4

Nodemanager

Ext4 Ext4 Ext4

Six Local DAS disks per Virtual Machine

VMDK VMDK VMDK VMDK VMDK VMDK VMDK

Hadoop Node 2 Virtual Machine

Datanode

Ext4

Nodemanager

Ext4 Ext4 Ext4 Ext4

VMDK VMDK VMDK VMDK

Ext4 Ext4 Ext4

Image credit: VMware

VIRTUALIZATION ARCHITECTURE

STORAGE DISK LAYOUT

vSAN

Ext4

MasterRole

VMDKOSimage

Hadoopmasternode

LocalDisks

Hypervisor

vSAN

Ext4 Ext4 Ext4

Datanode NodeManager

VMDK VMDK VMDKOSimage

Hadoopslavenode

Virtualmachine

Hardware

Image credit: VMware

SOME BENCHMARKS

SPHERE 6 RESULTS - 32 HOSTS, 23 DISKS PER HOST - 2014 REPORT

CONFIDENTIAL http://www.vmware.com/resources/techresources/10452

VM LAYOUT MATTERSLARGE DEPLOYMENT ARCHITECTURE

DEPLOYMENT LAYOUT

LAYOUT 1: ONE VSPHERE CLUSTER PER RACK

Rack01 Rack02 Rack03 Rack04 Rack05 Rack06 Rack07 Rack08 Cluster01 Cluster02 Cluster03 Cluster04 Cluster05 Cluster06 Cluster07 Cluster08

host001 host005 host009 host013 host017 host021 host025 host029 host002 host006 host010 host014 host018 host022 host026 host030 host003 host007 host011 host015 host019 host023 host027 host031 host004 host008 host012 host016 host020 host024 host028 host032

host033 host037 host041 host045 host049 host053 host057 host061 host034 host038 host042 host046 host050 host054 host058 host062 host035 host039 host043 host047 host051 host055 host059 host063 host036 host040 host044 host048 host052 host056 host060 host064

host065 host069 host073 host077 host081 host085 host089 host093

host066 host070 host074 host078 host082 host086 host090 host094 host067 host071 host075 host079 host083 host087 host091 host095 host068 host072 host076 host080 host084 host088 host092 host096

host097 host101 host105 host109 host113 host117 host121 host125 host098 host102 host106 host110 host114 host118 host122 host126 host099 host103 host107 host111 host115 host119 host123 host127 host100 host104 host108 host112 host116 host120 host124 host128

host129 host133 host137 host141 host145 host149 host153 host157 host130 host134 host138 host142 host146 host150 host154 host158 host131 host135 host139 host143 host147 host151 host155 host159 host132 host136 host140 host144 host148 host152 host156 host160

Image credit: VMware

DEPLOYMENT LAYOUT

LAYOUT 2: CROSS-RACK CLUSTER LAYOUT

Rack01 Rack02 Rack03 Rack04 Rack05 Rack06 Rack07 Rack08

Clusrter1

host001 host005 host009 host013 host017 host021 host025 host029

host002 host006 host010 host014 host018 host022 host026 host030

host003 host007 host011 host015 host019 host023 host027 host031

host004 host008 host012 host016 host020 host024 host028 host032

Cluster2

host033 host037 host041 host045 host049 host053 host057 host061

host034 host038 host042 host046 host050 host054 host058 host062

host035 host039 host043 host047 host051 host055 host059 host063

host036 host040 host044 host048 host052 host056 host060 host064

Cluster3

host065 host069 host073 host077 host081 host085 host089 host093

host066 host070 host074 host078 host082 host086 host090 host094

host067 host071 host075 host079 host083 host087 host091 host095

host068 host072 host076 host080 host084 host088 host092 host096

Cluster4

host097 host101 host105 host109 host113 host117 host121 host125

host098 host102 host106 host110 host114 host118 host122 host126

host099 host103 host107 host111 host115 host119 host123 host127

host100 host104 host108 host112 host116 host120 host124 host128

Cluster5

host129 host133 host137 host141 host145 host149 host153 host157

host130 host134 host138 host142 host146 host150 host154 host158

host131 host135 host139 host143 host147 host151 host155 host159

host132 host136 host140 host144 host148 host152 host156 host160

Image credit: VMware

DEPLOYMENT LAYOUT

VIRTUAL MACHINE ROLES - MASTERS AND CLIENTS

MasterVMs

host001 host037 host073 host109 host145 mst01 mst02 mst03 mst04 mst05

Disk:8192GB,RAM:120GB,

vCPU:16 Disk:8192GB,RAM:120GB,

vCPU:16 Disk:8192GB,RAM:120GB,

vCPU:16 Disk:8192GB,RAM:120GB,

vCPU:16 Disk:8192GB,RAM:120GB,

vCPU:16 NAMENODE RESOURCEMANAGER HIVE_METASTORE OOZIE_SERVER NAGIOS_SERVER RESOURCEMANAGER NAMENODE HIVE_SERVER FALCON_SERVER GANGLIA_SERVER JOURNALNODE JOURNALNODE JOURNALNODE OOZIE_SERVER ZKFC ZKFC MYSQL_SERVER* APP_TIMELINE_SERVER* HISTORYSERVER WEBHCAT_SERVER* SECONDARY_NAMENODE* ZOOKEEPER_SERVER ZOOKEEPER_SERVER ZOOKEEPER_SERVER ZOOKEEPER_SERVER ZOOKEEPER_SERVER GANGLIA_MONITOR GANGLIA_MONITOR GANGLIA_MONITOR GANGLIA_MONITOR GANGLIA_MONITOR cln01 cln01 cln01 cln01 cln01

Disk:8192GB,RAM:120GB,

vCPU:16 Disk:8192GB,RAM:120GB,

vCPU:16 Disk:8192GB,RAM:120GB,

vCPU:16 Disk:8192GB,RAM:120GB,

vCPU:16 Disk:8192GB,RAM:120GB,

vCPU:16 PIG PIG PIG PIG PIG SQOOP SQOOP SQOOP SQOOP SQOOP HIVE_CLIENT HIVE_CLIENT HIVE_CLIENT HIVE_CLIENT HIVE_CLIENT MAPREDUCE2_CLIENT MAPREDUCE2_CLIENT MAPREDUCE2_CLIENT MAPREDUCE2_CLIENT MAPREDUCE2_CLIENT HDFS_CLIENT HDFS_CLIENT HDFS_CLIENT HDFS_CLIENT HDFS_CLIENT YARN_CLIENT YARN_CLIENT YARN_CLIENT YARN_CLIENT YARN_CLIENT ZOOKEEPER_CLIENT ZOOKEEPER_CLIENT ZOOKEEPER_CLIENT ZOOKEEPER_CLIENT ZOOKEEPER_CLIENT OOZIE_CLIENT OOZIE_CLIENT OOZIE_CLIENT OOZIE_CLIENT OOZIE_CLIENT FALCON_CLIENT FALCON_CLIENT FALCON_CLIENT FALCON_CLIENT FALCON_CLIENT GANGLIA_MONITOR GANGLIA_MONITOR GANGLIA_MONITOR GANGLIA_MONITOR GANGLIA_MONITOR

Image credit: VMware

DEPLOYMENT LAYOUT

VIRTUAL MACHINE ROLES - WORKERS

Workers

host002 host003 host159 host160 wrk01 wrk01 wrk01 wrk01

Disk:8192GB,RAM:120GB,vCPU:

16 Disk:8192GB,RAM:120GB,vCPU:

16 Disk:8192GB,RAM:120GB,vCPU:

16 Disk:8192GB,RAM:120GB,vCPU:

16 NODEMANAGER NODEMANAGER NODEMANAGER NODEMANAGER DATANODE DATANODE DATANODE DATANODE GANGLIA_MONITOR GANGLIA_MONITOR GANGLIA_MONITOR GANGLIA_MONITOR

… wrk02 wrk02 wrk02 wrk02

Disk:8192GB,RAM:120GB,vCPU:

16 Disk:8192GB,RAM:120GB,vCPU:

16 Disk:8192GB,RAM:120GB,vCPU:

16 Disk:8192GB,RAM:120GB,vCPU:

16 NODEMANAGER NODEMANAGER NODEMANAGER NODEMANAGER DATANODE DATANODE DATANODE DATANODE GANGLIA_MONITOR GANGLIA_MONITOR GANGLIA_MONITOR GANGLIA_MONITOR host225 host226 host239 host240 wrk01 wrk01 wrk01 wrk01

Disk:8192GB,RAM:120GB,vCPU:

16 Disk:8192GB,RAM:120GB,vCPU:

16 Disk:8192GB,RAM:120GB,vCPU:

16 Disk:8192GB,RAM:120GB,vCPU:

16 NODEMANAGER NODEMANAGER NODEMANAGER NODEMANAGER DATANODE DATANODE DATANODE DATANODE GANGLIA_MONITOR GANGLIA_MONITOR GANGLIA_MONITOR GANGLIA_MONITOR

… wrk02 wrk02 wrk02 wrk02

Disk:8192GB,RAM:120GB,vCPU:

16 Disk:8192GB,RAM:120GB,vCPU:

16 Disk:8192GB,RAM:120GB,vCPU:

16 Disk:8192GB,RAM:120GB,vCPU:

16 NODEMANAGER NODEMANAGER NODEMANAGER NODEMANAGER DATANODE DATANODE DATANODE DATANODE GANGLIA_MONITOR GANGLIA_MONITOR GANGLIA_MONITOR GANGLIA_MONITOR

Image credit: VMware

DEPLOYMENT LAYOUT

LAYOUT 3: EXPANDED RACK LAYOUT (HADOOP/ANALYTICS APPS)

Rack09 Rack10 Rack11 Rack12 Rack13 Rack14 Rack15 Rack16

Cluster6

host161 host165 host169 host173 host177 host181 host185 host189

host162 host166 host170 host174 host178 host182 host186 host190

host163 host167 host171 host175 host179 host183 host187 host191

host164 host168 host172 host176 host180 host184 host188 host192

Cluster7

host193 host197 host201 host205 host209 host213 host217 host221

host194 host198 host202 host206 host210 host214 host218 host222

host195 host199 host203 host207 host211 host215 host219 host223

host196 host200 host204 host208 host212 host216 host220 host224

Cluster8

host225 host227 host229 host231 host233 host235 host237 host239 host226 host228 host230 host232 host234 host236 host238 host240

ESXicluster

Powerrack

MasterNode

WorkerNode

1:1HighMem

MasterNode(AnalyKcsApp)

WorkerNode(AnalyKcsApp)

Image credit: VMware

?