hadoop distributions: bottlenecks and tuning

Post on 10-May-2015

439 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

This presentation by Alexey Diomin, R&D Engineer at Altoros, explains how to spot performance bottlenecks in Hadoop and overviews five approaches to eliminating them.

TRANSCRIPT

Hadoop distributions. Bottlenecks and tuning.

Diomin AliakseyR&D

2014, Minsk

3

Hadoop Matrix

OpenSource Monitoring Target Group

Apache Hadoop Yes X Developers

Cloudera Yes Good All

Hortonworks Yes Good All

MapR No Bad Enterprise

PivotalHD No Bad Enterprise

4

How to find the bottleneck?

5

Monitoring & Logs

6

Brain

All stages

8

Map stage

9

Fetch stage

10

Merge stage

11

All stages

12

All stages

13

1. Increase size of cluster

2. Increase input block size

3. Increase buffer size

The most popular approaches

14

1. Increase size of cluster

2. Increase input block size

3. Increase buffer size

Popular approach

15

Small cluster, slow tasks

16

We need more gold ……

17

Large cluster, slow tasks

18

1. Increase size of cluster

2. Increase input block size

3. Increase buffer size

Popular approach

19

Increase input block size

20

1. Increase size of cluster

2. Increase input block size

3. Increase buffer size

Popular approach

21

1. Compression

Other techniques

22

1. Compression

2. Combiner

Other techniques

23

Wordcount

Reduce function as Combine

combine 1: <a, 1> <b, 1> <a, 1> => <a, 2> <b, 1>

combine 2: <a, 1> <b, 1> => <a, 1> <b, 1>

Reduce: <a, {1, 2}> <b, {1, 1}> => <a, 3> <b, 2>

Combiner

24

Mean

combine 1: <k,40> <k,30> <k,20> => <k, 30>

combine 2: <k,2> <k,8> => <k, 5>

Reduce: <k, {30, 5}> => <k, 17.5>

Combiner

25

Mean

combine 1: <k,40> <k,30> <k,20> => <k, 30>

combine 2: <k,2> <k,8> => <k, 5>

Reduce: <k, {30, 5}> => <k, 17.5>

(40 + 30 + 20 + 2 + 8)/5 = 17.5

Combiner

26

Mean

combine 1:

<k,<40,1>> <k,<30,1>>, <k,<20,1>> => <k, <90,3> >

combine 2:

<k,<2,1>> <k, <8,1>> => <k, <10, 2> >

Reduce: <k, {<90,3>, <10,2>} > => <k, 20>

Combiner

27

top related