Transcript
Page 1: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

1

Benchmarking Accumulo: How Fast is Fast?

Mike Drob

Software Engineer, Cloudera

Page 2: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Me

• Cloudera Engineer

• Accumulo Committer

• Perpetual Tinkerer

2

Victor Grigas CC-BY-SA 3.0

Page 3: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Agenda

• Methodology

• Accumulo 1.4 to 1.6

• Accumulo to HBase

• Conclusions

3

Reuvenk CC-BY-SA 2.5

Page 4: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Methodology

• Measuring Performance

• Task Latency (time)

• Throughput (bps)

• Workloads

• Read

• Write

• Mixed

4

AngMoKio CC-BY-SA 2.5

Page 5: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Methodology

• Yahoo! Cloud Serving Benchmark

• Workloads

• Connectors

• Highly configurable

• # of Rows/Columns

• Size of Value

• # of Threads

• Parallelizable number of clients

5

Sfoskett CC BY-SA 3.0

Page 6: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

6

Accumulo across versions

Page 7: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Accumulo across versions

• Accumulo 1.4.4-cdh4.5.0

• Accumulo 1.6.0-cdh4.6.0-beta-1

• YCSB 0.14+50

• 80 node cluster

• 10 clients

• 5 racks

7

Public Domain via USAF

Page 8: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Accumulo across versions

The Data:

• 200 GB

• 2k Columns

• Pre-Split Table 80x

• Vary # of rows

• Vary value size

(we actually did a lot more,

but it was hard to graph)

8

Morio CC BY-SA 3.0

Page 9: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Accumulo across versions

9

0

200

400

600

800

1000

1200

1400

10 100 1000 10000 100000

Thro

ugh

pu

t (M

B/s

ec)

# of Rows

Aggregate Read

Accumulo 1.4

Accumulo 1.6

Page 10: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Accumulo across versions

10

0

200

400

600

800

1000

1200

1400

1600

1800

2000

10 100 1000 10000 100000

Thro

ugh

pu

t (M

B/s

ec)

# of Rows

Aggregate Mixed

Accumulo 1.4

Accumulo 1.6

Page 11: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Accumulo across versions

11

0

50

100

150

200

250

300

350

400

450

10 100 1000 10000 100000

Thro

ugh

pu

t (M

B/s

ec)

# of Rows

Aggregate Write

Accumulo 1.4

Accumulo 1.6

Page 12: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Accumulo across versions

• Write speed improved!

• Read speed about the same.

• Something weird happens writing 1000 rows.

12

Christopher Foster CC BY-SA 3.0

Page 13: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Accumulo across versions

So, what happens at 1000 rows…? Nothing.

13

100

200

300

400

500

600

700

10 100 1000 10000 100000

Thro

ugh

pu

t (M

B/s

ec)

# of Rows

Problem is at 100 rows.

Page 14: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

14

Accumulo and HBase

Page 15: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Accumulo and HBase

• Accumulo 1.6.0-cdh4.6.0-beta-1

• HBase 0.94.15-cdh4.6.0

• YCSB 0.14+50

• 5 worker nodes

• 5 split points

• 5G Heap, 3G mem map

15

Abdullah AlBargan CC BY-ND 2.0

Page 16: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Accumulo and HBase

• Single client (5 threads)

• Workload sizes

• In memory (15G)

• Force disk activity (30G)

• Constant # of rows

• Vary # of columns

• Activity

• 100% Write

• 100% Read

16

nahtanoj CC-BY-2.0

Page 17: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Accumulo and HBase

17

0

100

200

300

400

500

600

100 1000 10000 100000 1000000

Thro

ugh

pu

t (M

B/s

ec)

# of columns

Reading 15GB (500 rows)

Accumulo

Hbase

Page 18: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Accumulo and HBase

18

0

100

200

300

400

500

600

100 1000 10000 100000 1000000

Thro

ugh

pu

t (M

B/s

ec)

# of columns

Reading 30GB (1000 rows)

Accumulo

Hbase

Page 19: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Accumulo and HBase

19

0

10

20

30

40

50

60

70

80

100 1000 10000 100000 1000000

Thro

ugh

pu

t (M

B/s

ec)

# of columns

Writing 15GB (500 rows)

Accumulo

Hbase

Page 20: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Accumulo and HBase

20

0

10

20

30

40

50

60

70

80

100 1000 10000 100000 1000000

Thro

ugh

pu

t (M

B/s

ec)

# of columns

Writing 30GB (1000 rows)

Accumulo

Hbase

Page 21: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

21

Performance Tweaks

Page 22: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Performance Tweaks – Client Side

• Number of rows/columns

• Batch Writer Threads

• Batch Writer Buffer Size

• Use large buffer for small values

• Use small buffer for large values

• ACCUMULO-2766 possible fix

22

Public Domain via USN

Page 23: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Performance Tweaks – Server Side

• Apply table splits liberally

• Increase automatic split threshold

• Some properties to play with:

• table.compaction.minor.logs.threshold

• tserver.compaction.minor.concurrent.max

• tserver.walog.max.size

• If running with dfs.datanode.synconclose also enable dfs.datanode.sync.behind.writes

23

Page 24: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

24

Thank You! Please visit our booth!

Mike Drob – [email protected]

@mikhaildrob


Top Related