the zoo expands: labrador *loves* elephant, thanks to hamster
DESCRIPTION
The refactoring of Hadoop MapReduce framework, by separating resource management (YARN) from job execution (MapReduce) has allowed multiple programming paradigms to take advantage of the massive scale Hadoop Distributed File System (HDFS) clusters. Hamster (Hadoop And Mpi on the same cluSTER) is a port of OpenMPI to use YARN as a resource manager. Hamster allows applications written using MPI (Message Passing Interface) to run alongside other YARN applications and frameworks, such as MapReduce, on the same Hadoop cluster. In this talk, I will describe the architecture of Hamster, and present a few MPI applications that have been demonstrated to run in Hadoop. GraphLab uses MPI as one of the supported communication libraries, and can read/write data from/to HDFS. I will describe how GraphLab runs on top of Hadoop using Hamster, and present a few benchmarks in graph analytics, comparing GraphLab with other machine frameworks.TRANSCRIPT
The Zoo Expands Labrador š Elephant, Thanks to Hamster
Milind Bhandarkar Chief Scientist, Pivotal Software, Inc.
About Meā¢ http://www.linkedin.com/in/milindb
ā¢ Founding member of Hadoop team at Yahoo! [2005-2010]
ā¢ Contributor to Apache Hadoop since v0.1
ā¢ Built and led Grid Solutions Team at Yahoo! [2007-2010]
ā¢ Parallel Programming Paradigms [1989-today] (PhD cs.illinois.edu)
ā¢ Center for Development of Advanced Computing (C-DAC), National Center for Supercomputing Applications (NCSA), Center for Simulation of Advanced Rockets, Siebel Systems (acquired by Oracle), Pathscale Inc. (acquired by QLogic), Yahoo!, LinkedIn, and Pivotal (formerly Greenplum)
Hamster
ā¢ Hadoop and MPI on the same cluster
ā¢ Runtime for OpenMPI applications on YARN
ā¢ Available on Pivotal HD
Why MPI ?ā¢ Hadoop Dataflow paradigms (MapReduce,
TeZ etc) not suitable for iterative applications
ā¢ Message Passing Interface (MPI)
ā¢ Mature standard
ā¢ Used extensively in HPC
ā¢ Huge ecosystem
MPI in Science & Engg
Earth Atmosphere
Chemistry
Biology
Math Nuclear
MPI in Industry
Mechanical ļæ½ar
Finance/bank Oil Exploration Cryptography
Spacecraft
OpenMPI
ā¢ Mature Open Source implementation of MPI 3.0 Standard (mpi-forum.org)
ā¢ New BSD license
ā¢ 30+ contributing organizations from academia, research and industry
ā¢ http://open-mpi.org
OpenMPI Architecture
Pluggable
Hamster Designā¢ YARN as Resource Manager
ā¢ Hamster Application Manager
ā¢ Manages MPI jobs
ā¢ (tries to) Implement Gang-Scheduling
ā¢ Leverages OMPI/ORTE strengths
ā¢ Wire-up, Task monitoring, Fast Interconnect
Hamster ArchitectureResource Manager
Scheduler
AMService
Node Manager Node Manager Node Manager
ā¦
Proc/Container
Framework Daemon NSMPI Scheduler HNP
MPI AM
Proc/Containerā¦RM-AM
AM-NM
RM-NodeManagerClient
Client-RM
Aux Srvcs
Proc/Container
Framework Daemon NS
Proc/Containerā¦
Aux SrvcsRM-
NodeManager
Hamster AppMasterā¢ Master daemon for MPI ( similar to JobTracker in
MapReduce)
ā¢ Implements and participates in the YARN-RM App lifecycle protocol
ā¢ Maintains heartbeat with RM to ensure liveness
ā¢ MPI Scheduler - Negotiates resource allocation with YARN-RM
ā¢ Head Node Process (HNP) - manages job execution
Hamster Node Service
ā¢ User-level daemon per MPI job
ā¢ Manages task execution
ā¢ Coarse-grained container management
ā¢ Bootstrapped by YARN-NM
ā¢ Implemented as YARN Auxiliary Service
Why GraphLab on Hadoop ?
ā¢ Graph Analytics & Machine Learning only one stage in E2E data pipeline
ā¢ ETL/Preprocessing
ā¢ Building Graphs from fact & dimension tables
ā¢ Publishing analytics results, post-processing
GraphLab 2.2
ā¢ Communication patterns based on Data
ā¢ Several Toolkits (Graph Analytics + ML Algorithms) available
ā¢ Graph-Programming API
ā¢ Uses MPI for communication
Pivotal HD
HDFS
HBase Pig, Hive, Mahout
Map Reduce
Sqoop Flume
Resource
Management & Workflow
Yarn
Zookeeper
Apache Pivotal
Command Center Configure,
Deploy, Monitor, Manage
Spring XD
Pivotal HD Enterprise
Spring
Xtension Framework
Catalog Services
Query Optimizer
Dynamic Pipelining
ANSI SQL + Analytics
HAWQ ā Advanced Database Services
Distributed In-memory
Store
Query Transactions
Ingestion Processing
Hadoop Driver ā Parallel with Compaction
ANSI SQL + In-Memory
GemFire XD ā Real-Time Database Services
MADlib Algorithms
Oozie
Virtual Extensions
Graphlab, Open MPI
Performance
Test Environment
ā¢ Pivotal Analytics Workbench Cluster
ā¢ Pivotal HD 1.1 (Apache Hadoop 2.0.5)
ā¢ Hamster - 1.0, OpenMPI-1.7.2
ā¢ 515 nodes
ā¢ 2x6-core Westmere, 48GB RAM, 12x2TB SATA, Mellanox FDR Infiniband
Null Jobā¢ Measures overhead of launching MPI jobs
ā¢ Tests scalability of resource allocation, launching and wire-up
ā¢ Sub-linear scalability (slightly worse than O(logN)
ā¢ Overhead of launching 15000 processes = 1 minute
Total RuntimeTi
me
(Sec
.)
5
18.75
32.5
46.25
60
Process number0 4000 8000 12000 16000
E2E time
Allocation TimeTi
me
(Sec
.)
1
2.25
3.5
4.75
6
Number of Processes0 4000 8000 12000 16000
Allocation Time
Launch TimeTi
me
(Sec
.)
0
7.5
15
22.5
30
Number of processes0 4000 8000 12000 16000
Launch Time
Comparison with OpenMPI
ā¢ HPL (HP Linpack for Top-500)
ā¢ Number of processes 50ā1000
ā¢ Hamster 1% slower than OpenMPI
HPL - Hamster vs OpenMPI
Tim
e (S
ec.)
0
30
60
90
120
1000 500 200 50
GraphLab ALS
ā¢ Wikipedia dataset
ā¢ 4.3 M terms, 3.3M documents, 513M occurrences
ā¢ 17 Processes
ā¢ 5 Iterations
GraphLab ALSTi
me
(Sec
.)
0
335
670
1005
1340
Hamster OpenMPI
GraphLab PageRankā¢ Twitter Dataset
ā¢ 4.1 M nodes, 1.4 B edges
ā¢ Data Size : 26GB
ā¢ NP = 17
ā¢ 50 iterations: 297 seconds
ā¢ 100 iterations: 339 seconds
Questions?