on openstack of hadoop performance · real-life workload data locality conclusion agenda mirantis...

39
Performance of Hadoop on OpenStack Andrew Lazarev Mirantis, 2014

Upload: others

Post on 20-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

Performance of Hadoop on OpenStack

Andrew LazarevMirantis, 2014

Page 2: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

● Introduction● Environment description● Direct virtualization impact● Real-life workload● Data locality● Conclusion

Agenda

Page 3: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

What Is Hadoop?Ambari

(Man

agem

ent)

ZooK

eeper

(Coo

rdin

atio

n)

Oozie

(Sch

edul

ing)

HDFS(File System)

HBase

(NoS

ql S

tore

)

MapReduce(Programming Framework)

Pig

(Dat

a Fl

ow)

Hive

(SQ

L)

Storm

(Rea

l-tim

e co

mpu

tatio

n)

- Core Apache Hadoop

Page 4: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

● Easy to operate cluster● One-click self-service provisioning● Sharing hardware between several Hadoop

clusters● Tenants isolation on hypervisor and network

layers● Comparable performance with much more

flexibility

Why Virtualize Hadoop?

Page 5: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

● Sahara - OpenStack Data Processing project○ OpenStack Integrated○ Supports Hadoop 1 and 2○ Different vendors (Apache, Hortonworks, Intel*)○ Cluster provisioning and on-demand jobs

execution

How To Virtualize?

Page 6: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

Direct impact

● Disk write● Disk read● Network● CPU

Virtualization Impact

Page 7: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

Indirect impact

● Lack of low level system control● Resources for hypervisor operation

Virtualization Impact

Page 8: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

● Introduction● Environment description● Direct virtualization impact● Real-life workload● Data locality● Conclusion

Agenda

Page 9: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

● Mirantis OpenStack Express cluster● 20 nodes● CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)● Memory: 8 x 4.0 GB, 32.0 GB total● Disk: 1 drive, 0.9 TB (WDC WD1003FBYX-0)● Network: 2 x 1 GbE

Environment

Page 10: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

● Host OS: CentOS 6.5● VM OS: CentOS 6.5● Mirantis OpenStack● QEMU-KVM 1.2.0● Network: Neutron + GRE● Open vSwitch 1.10.2

Environment (continuation)

Page 11: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

● Hadoop: Vanilla Apache 1.2.1● Bare metal setup:

○ 19 Hadoop Nodes● OpenStack setup:

○ 1 Controller + 19 Computes○ 19 (or 57) VMs with Hadoop

Environment (continuation)

Page 12: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

● Introduction● Environment description● Direct virtualization impact● Real-life workload● Data locality● Conclusion

Agenda

Page 13: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

Disk Write (using dd)

*greater is better

Page 14: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

● TestDFSIO - built-in hadoop IO test○ write test○ read test○ 1000 files of 1GB (1 TB total)

Disk Write (hadoop test)

Page 15: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

Disk Write (hadoop test)

*less is better

Page 16: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

Disk Write (hadoop test)

*less is better

Page 17: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

● “disk_cachemodes” param in nova.conf○ writethrough (default) - guest disk write cache

is disabled ○ writeback - guest disk write cache is enabled

Disk Cache Mode

Page 18: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

● Writeback cache enabled● One large VM with all memory per Host

Disk Write (dd, writeback cache)

Page 19: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

Disk Write (dd, writeback cache)

*greater is better

Page 20: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

Disk Write (hadoop test, writeback cache)

*less is better

Page 21: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

● QEMU 1.4: ○ high performance virtio-blk data plane

implementation○ +108.0% on rnd-write (based on RedHat

presentation on KVM Forum):

Disk Write - Way To Improve

Page 22: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

Disk Read (using hdparm)

*greater is better

Page 23: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

Disk Read (using hdparm)

*greater is better

Page 24: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

Disk Read (hadoop test)

*less is better

Page 25: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

Network (OVS+GRE)

*greater is better

Page 26: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

● PI - built-in hadoop test● Depends mostly on CPU● 50 series of 10,000,000,000 probes

CPU (hadoop test)

Page 27: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

CPU (hadoop test)

*less is better

Page 28: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

● Introduction● Environment description● Direct virtualization impact● Real-life workload● Data locality● Conclusion

Agenda

Page 29: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

● Built-in hadoop test● Represents real Hadoop workload● Involves

○ IO○ Networking○ Computation

● Sorting 200,000,000 of 100-byte entries (20 GB)● Writeback cache enabled

Terasort

Page 30: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

Terasort

*less is better

Page 31: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

● Introduction● Environment description● Direct virtualization impact● Real-life workload● Data locality● Conclusion

Agenda

Page 32: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

● Hadoop can consider “distance” between nodes● Intelligent task scheduling● Reading data from “close” data nodes

Data Locality

NODENODE

NODE

NODE

NODE

NODE

Page 33: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

Data Locality

*greater is better

Page 34: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

● Network within host comparable to disk speed● Allows hadoop process isolation (VM per process)● Test:

○ 1 Master Node (JobTracker + NameNode)○ 18 DataNodes○ 18 TaskTrackers○ TeraSort of 20 Gb data

Data Locality

Page 35: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

Terasort (data locality)

*less is better

Page 36: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

● Introduction● Environment description● Direct virtualization impact● Real-life workload● Data locality● Conclusion

Agenda

Page 37: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

● Only 6% performance impact for composite test● Performance continuously improving with

external libs upgrade (QEMU, Open vSwitch)● Much more topology flexibility● Isolation at low cost

○ between clusters○ between nodes within cluster

Conclusion

Page 38: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

Q&A

Page 39: on OpenStack of Hadoop Performance · Real-life workload Data locality Conclusion Agenda Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)

Thank you!

Andrew LazarevLaunchpad/GitHub/IRC: alazarevE-Mail: [email protected]