march 2011 hug: scaling hadoop

29
Scaling Hadoop Applications Milind Bhandarkar Distributed Data Systems Engineer [[email protected]] [twitter: @techmilind] http://www.flickr.com/photos/theplanetdotcom/4878812549/

Upload: yahoo-developer-network

Post on 28-Jun-2015

4.736 views

Category:

Technology


0 download

DESCRIPTION

Scaling Hadoop Applications - Talk by Milind Bhandarkar, Linked In

TRANSCRIPT

Page 1: March 2011 HUG: Scaling Hadoop

Scaling Hadoop Applications

Milind BhandarkarDistributed Data Systems Engineer

[[email protected]][twitter: @techmilind]

http://www.flickr.com/photos/theplanetdotcom/4878812549/

Page 2: March 2011 HUG: Scaling Hadoop

About Me

http://www.linkedin.com/in/milindb

Founding member of Hadoop team at Yahoo! [2005-2010]

Contributor to Apache Hadoop & Pig since version 0.1

Built and Led Grid Solutions Team at Yahoo! [2007-2010]

Center for Development of Advanced Computing (C-DAC), National Center for Supercomputing Applications (NCSA), Center for Simulation of Advanced Rockets, Siebel Systems, Pathscale Inc. (acquired by QLogic), Yahoo!, and LinkedIn

http://www.flickr.com/photos/tmartin/32010732/

Page 3: March 2011 HUG: Scaling Hadoop

Outline

Scalability of Applications

Causes of Sublinear Scalability

Best practices

Q & A

http://www.flickr.com/photos/86798786@N00/2202278303/

Page 4: March 2011 HUG: Scaling Hadoop

Importance of Scaling

Zynga’s Cityville: 0 to 100 Million users in 43 days ! (http://venturebeat.com/2011/01/14/zyngas-cityville-grows-to-100-million-users-in-43-days/)

Facebook in 2009 : From 150 Million users to 350 Million ! (http://www.facebook.com/press/info.php?timeline)

LinkedIn in 2010: Adding a member per second ! (http://money.cnn.com/2010/11/17/technology/linkedin_web2/index.htm)

Public WWW size: 26M in 1998, 1B in 2000, 1T in 2008 (http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html)

Scalability allows dealing with success gracefully !

http://www.flickr.com/photos/palmdiscipline/2784992768/

Page 5: March 2011 HUG: Scaling Hadoop

Explosion of Data

(Data-Rich Computing theme proposal. J. Campbell, et al, 2007)

http://www.flickr.com/photos/28634332@N05/4128229979/

Page 6: March 2011 HUG: Scaling Hadoop

Hadoop

Simplifies building parallel applications

Programmer specifies Record-level and Group-level sequential computation

Framework handles partitioning, scheduling, dispatch, execution, communication, failure handling, monitoring, reporting etc.

User provides hints to control parallel execution

Page 7: March 2011 HUG: Scaling Hadoop

Scalability of Parallel Programs

If one node processes k MB/s, then N nodes should process (k*N) MB/s

If some fixed amount of data is processed in T minutes on one node, the N nodes should process same data in (T/N) minutes

Linear Scalability

http://www-03.ibm.com/innovation/us/watson/

Page 8: March 2011 HUG: Scaling Hadoop

Reduce Latency

Minimize program

execution time

http://www.flickr.com/photos/adhe55/2560649325/

Page 9: March 2011 HUG: Scaling Hadoop

Increase Throughput

Maximize data processed per unit

time

http://www.flickr.com/photos/mikebaird/3898801499/

Page 10: March 2011 HUG: Scaling Hadoop

Three Equations

http://www.flickr.com/photos/ucumari/321343385/

Page 11: March 2011 HUG: Scaling Hadoop

Amdahl’s Law

S N

1(N 1)http://www.computer.org/portal/web/awards/amdahl

Page 12: March 2011 HUG: Scaling Hadoop

Multi-Phase Computations

If computation C is split into N different parts, C1..CN

If partial computation Ci can be speeded up by a factor of Si

http://www.computer.org/portal/web/awards/amdahl

Page 13: March 2011 HUG: Scaling Hadoop

Amdahl’s Law, Restated

S Ci

i1

N

CiSii1

N

http://www.computer.org/portal/web/awards/amdahl

Page 14: March 2011 HUG: Scaling Hadoop

MapReduce Workflow

http://www.computer.org/portal/web/awards/amdahl

Page 15: March 2011 HUG: Scaling Hadoop

Little’s Law

L Whttp://sloancf.mit.edu/photo/facstaff/little.gif

Page 16: March 2011 HUG: Scaling Hadoop

Message Cost Model

T Nhttp://www.flickr.com/photos/mattimattila/4485665105/

Page 17: March 2011 HUG: Scaling Hadoop

Message Granularity

For Gigabit Ethernet α = 300 μS β = 100 MB/s

100 Messages of 10KB each = 40 ms

10 Messages of 100 KB each = 13 ms

http://www.flickr.com/photos/philmciver/982442098/

Page 18: March 2011 HUG: Scaling Hadoop

Alpha-Beta

Common Mistake: Assuming that α is constant Scheduling latency for responder Jobtracker/Tasktracker time slice inversely

proportional to number of concurrent tasks

Common Mistake: Assuming that β is constant Network congestion TCP incast

Page 19: March 2011 HUG: Scaling Hadoop

Causes of Sublinear Scalability

http://www.flickr.com/photos/sabertasche2/2609405532/

Page 20: March 2011 HUG: Scaling Hadoop

Sequential Bottlenecks

Mistake: Single reducer (default) mapred.reduce.tasks Parallel directive

Mistake: Constant number of reducers independent of data size default_parallel

http://www.flickr.com/photos/bogenfreund/556656621/

Page 21: March 2011 HUG: Scaling Hadoop

Load Imbalance

Unequal Partitioning of Work

Imbalance = Max Partition Size – Min Partition Size

Heterogeneous workers Example: Disk Bandwidth

Empty Disk: 100 MB/s 75% Full disk: 25 MB/s

http://www.flickr.com/photos/kelpatolog/176424797/

Page 22: March 2011 HUG: Scaling Hadoop

Over-Partitioning

For N workers, choose M partitions, M >> N

Adaptive Scheduling, each worker chooses the next unscheduled partition

Load Imbalance = Max(Sum(Wk)) – Min(Sum(Wk))

For sufficiently large M, both Max and Min tend to Average(Wk), thus reducing load imbalance

http://www.flickr.com/photos/infidelic/3238637031/

Page 23: March 2011 HUG: Scaling Hadoop

Critical Paths

Maximum distance between start and end nodes

Long tail in Map task executions Enable Speculative Execution

Overlap communication and computation Asynchronous notifications Example: deleting intermediate outputs after

job completion

http://www.flickr.com/photos/cdevers/4619306252/

Page 24: March 2011 HUG: Scaling Hadoop

Algorithmic Overheads

Repeating computation in place of adding communication

Parallel exploration of solution space results in wastage of computations

Need for a coordinator

Share partial, unmaterialized results Consider memory-based small replicated K-V

stores with eventual consistency

http://www.flickr.com/photos/sbprzd/140903299/

Page 25: March 2011 HUG: Scaling Hadoop

Synchronization

Shuffle stage in Hadoop, M*R transfers

Slowest system components involved: Network, Disk

Can you eliminate Reduce stage ?

Use combiners ?

Compress intermediate output ?

Launch reduce tasks before all maps finish

http://www.flickr.com/photos/susan_d_p/2098849477/

Page 26: March 2011 HUG: Scaling Hadoop

Task Granularity

Amortize task scheduling and launching overheads

Centralized scheduler Out-of-band Tasktracker heartbeat reduces latency,

but May overwhelm the scheduler

Increase task granularity Combine multiple small files Maximize split sizes Combine computations per datum

http://www.flickr.com/photos/philmciver/982442098/

Page 27: March 2011 HUG: Scaling Hadoop

Hadoop Best Practices

Use higher-level languages (e.g. Pig)

Coalesce small files into larger ones & use bigger HDFS block size

Tune buffer sizes for tasks to avoid spill to disk

Consume only one slot per task

Use combiner if local aggregation reduces size

http://www.flickr.com/photos/yaffamedia/1388280476/

Page 28: March 2011 HUG: Scaling Hadoop

Hadoop Best Practices

Use compression everywhere CPU-efficient for intermediate data Disk-efficient for output data

Use Distributed Cache to distribute small side-files (<< than input split size)

Minimize number of NameNode/JobTracker RPCs from tasks

http://www.flickr.com/photos/yaffamedia/1388280476/

Page 29: March 2011 HUG: Scaling Hadoop

Questions ?

http://www.flickr.com/photos/horiavarlan/4273168957/