accumulo summit 2014: four orders of magnitude: running large scale accumulo clusters

172
Four Orders of Magnitude: Running Large Scale Accumulo Clusters Aaron Cordova Accumulo Summit, June 2014

Post on 21-Oct-2014

290 views

Category:

Technology


0 download

DESCRIPTION

Speaker: Aaron Cordova Most users of Accumulo start developing applications on a single machine and will to scale to up to four orders of magnitude more machines without having to rewrite. In this talk we describe techniques for designing applications for scale, planning a large scale cluster, tuning the cluster for high speed ingest, dealing with a large amount of data over time, and unique features of Accumulo for taking advantage of up to ten thousand nodes in a single instance. We also include the largest public metrics gathered on Accumulo clusters to date and include a discussion of overcoming practical limits to scaling in the future.

TRANSCRIPT

Page 1: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Aaron Cordova Accumulo Summit, June 2014

Page 2: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Scale, Security, Schema

Page 3: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Scale

Page 4: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

to scale1 - (vt) to change the size of something

Page 5: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

“let’s scale the cluster up to twice the original size”

Page 6: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

to scale2 - (vi) to function properly at a large scale

Page 7: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

“Accumulo scales”

Page 8: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

What is Large Scale?

Page 9: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Notebook Computer

• 16 GB DRAM

• 512 GB Flash Storage

• 2.3 GHz quad-core i7 CPU

Page 10: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Modern Server

• 100s of GB DRAM

• 10s of TB on disk

• 10s of cores

Page 11: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Large ScaleLaptop Server 10 Node

Cluster100

Nodes1000

Nodes10,000 Nodes

10 GB

100 GB

1 TB

10 TB

100 TB

1 PB

10 PB

100 PB

In RAM On Disk

Page 12: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Data Composition

0

45

90

135

180

January February March April

Original Raw Derivative QFDs Indexes

Page 13: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Accumulo Scales

• From GB to PB, Accumulo keeps two things low:

• Administrative effort

• Scan latency

Page 14: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Scan Latency

0

0.013

0.025

0.038

0.05

0 250 500 750 1000

Page 15: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Administrative Overhead

0

3

6

9

12

0 250 500 750 1000

Failed Machines Admin Intervention

Page 16: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Accumulo Scales

• From GB to PB three things grow linearly:

• Total storage size

• Ingest Rate

• Concurrent scans

Page 17: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Ingest Benchmark

0

25

50

75

100

0 250 500 750 1000

Milli

ons

of e

ntrie

s pe

r sec

ond

Page 18: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

AWB Benchmark

http://sqrrl.com/media/Accumulo-Benchmark-10312013-1.pdf

Page 19: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

1000 machines

Page 20: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

100 M entries written per second

Page 21: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

408 terabytes

Page 22: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

7.56 trillion total entries

Page 23: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Graph Benchmark

http://www.pdl.cmu.edu/SDI/2013/slides/big_graph_nsa_rd_2013_56002v1.pdf

Page 24: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

1200 machines

Page 25: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

4.4 trillion vertices

Page 26: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

70.4 trillion edges

Page 27: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

149 M edges traversed per second

Page 28: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

1 petabyte

Page 29: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Graph Analysis

Billions of Edges

1

100

10000

Twitter Yahoo! Facebook Accumulo

70,000

1,000

6.61.5

Page 30: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Accumulo is designed after Google’s BigTable

Page 31: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

BigTable powers hundreds of applications at Google

Page 32: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

BigTable serves 2+ exabytes

http://hbasecon.com/sessions/#session33

Page 33: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

600 M queries per second organization wide

Page 34: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

From 10 to 10,000

Page 35: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Starting with ten machines 101

Page 36: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

One rack

Page 37: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

1 TB RAM

Page 38: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

10-100 TB Disk

Page 39: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Hardware failures rare

Page 40: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Test Application Designs

Page 41: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Designing Applications for Scale

Page 42: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Keys to Scaling

1. Live writes go to all servers

2. User requests are satisfied by few scans

3. Turning updates into inserts

Page 43: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Keys to Scaling

Writes on all servers Few Scans

Page 44: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Hash / UUID KeysRowID Col Value

af362de4 Bob

b23dc4be Annie

b98de2ff Joe

c48e2ade $30

c7e43fb2 $25

d938ff3d 32

e2e4dac4 59

e98f2eab3 43

Key Value

userA:name Bob

userA:age 43

userA:account $30

userB:name Annie

userB:age 32

userB:account $25

userC:name Joe

userC:age 59

Uniform writes

Page 45: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

MonitorParticipating Tablet Servers

MyTable

Servers Hosted Tablets … Ingest

r1n1 1500 200k

r1n2 1501 210k

r2n1 1499 190k

r2n2 1500 200k

Page 46: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Hash / UUID KeysRowID Col Value

af362de4 Bob

b23dc4be Annie

b98de2ff Joe

c48e2ade $30

c7e43fb2 $25

d938ff3d 32

e2e4dac4 59

e98f2eab3 43

3 x 1-entry scans on 3 servers

get(userA)

Page 47: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Keys to Scaling

Writes on all servers Few Scans

Hash / UUID Keys

Page 48: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Group for LocalityKey Value

userA:name Bob

userA:age 43

userB:name Annie

userB:age 32

userC:name Fred

userC:age 29

userD:name Joe

userD:age 59

Key Value

userA:name Bob

userA:age 43

userA:account $30

userB:name Annie

userB:age 32

userB:account $25

userC:name Joe

userC:age 59

RowID Col Value

af362de4 name Annie

af362de4 age 32

af362de4 account $25

c48e2ade name Joe

c48e2ade age 59

e2e4dac4 name Bob

e2e4dac4 age 43

e2e4dac4 account $30

Still fairly uniform writes

Page 49: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Group for LocalityRowID Col Value

af362de4 name Annie

af362de4 age 32

af362de4 account $25

c48e2ade name Joe

c48e2ade age 59

e2e4dac4 name Bob

e2e4dac4 age 43

e2e4dac4 account $30

1 x 3-entry scan on 1 server

get(userA)

Page 50: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Keys to Scaling

Writes on all servers Few Scans

Grouped Keys

Page 51: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Temporal KeysKey Value

userA:name Bob

userA:age 43

userB:name Annie

userB:age 32

userC:name Fred

userC:age 29

userD:name Joe

userD:age 59

Key Value

20140101 44

20140102 22

20140103 23

RowID Col Value

20140101 44

20140102 22

20140103 23

Page 52: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Temporal KeysKey Value

userA:name Bob

userA:age 43

userB:name Annie

userB:age 32

userC:name Fred

userC:age 29

userD:name Joe

userD:age 59

Key Value

20140101 44

20140102 22

20140103 23

20140104 25

20140105 31

RowID Col Value

20140101 44

20140102 22

20140103 23

20140104 25

20140105 31

Page 53: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Temporal KeysKey Value

userA:name Bob

userA:age 43

userB:name Annie

userB:age 32

userC:name Fred

userC:age 29

userD:name Joe

userD:age 59

Key Value

20140101 44

20140102 22

20140103 23

20140104 25

20140105 31

20140106 27

20140107 25

20140108 17

RowID Col Value

20140101 44

20140102 22

20140103 23

20140104 25

20140105 31

20140106 27

20140107 25

20140108 17

Always write to one server

Page 54: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

No write parallelism

Page 55: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Temporal KeysRowID Col Value

20140101 44

20140102 22

20140103 23

20140104 25

20140105 31

20140106 27

20140107 25

20140108 17

Fetching ranges uses few scans

get(20140101 to 201404)

Page 56: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Keys to Scaling

Writes on all servers Few Scans

Temporal Keys

Page 57: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Binned Temporal KeysKey Value

userA:name Bob

userA:age 43

userB:name Annie

userB:age 32

userC:name Fred

userC:age 29

userD:name Joe

userD:age 59

Key Value

20140101 44

20140102 22

20140103 23

RowID Col Value

0_20140101 44

1_20140102 22

2_20140103 23

Uniform Writes

Page 58: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Binned Temporal KeysKey Value

userA:name Bob

userA:age 43

userB:name Annie

userB:age 32

userC:name Fred

userC:age 29

userD:name Joe

userD:age 59

Key Value

20140101 44

20140102 22

20140103 23

20140104 25

20140105 31

20140106 27

RowID Col Value

0_20140101 44

0_20140104 25

1_20140102 22

1_20140105 31

2_20140103 23

2_20140106 27

Uniform Writes

Page 59: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Binned Temporal KeysKey Value

userA:name Bob

userA:age 43

userB:name Annie

userB:age 32

userC:name Fred

userC:age 29

userD:name Joe

userD:age 59

Key Value

20140101 44

20140102 22

20140103 23

20140104 25

20140105 31

20140106 27

20140107 25

20140108 17

RowID Col Value

0_20140101 44

0_20140104 25

0_20140107 25

1_20140102 22

1_20140105 31

1_20140108 17

2_20140103 23

2_20140106 27

Uniform Writes

Page 60: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Binned Temporal KeysRowID Col Value

0_20140101 44

0_20140104 25

0_20140107 25

1_20140102 22

1_20140105 31

1_20140108 17

2_20140103 23

2_20140106 27

One scan per bin

get(20140101 to 201404)

Page 61: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Keys to Scaling

Writes on all servers Few Scans

Binned Temporal Keys

Page 62: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Keys to Scaling

• Key design is critical

• Group data under common row IDs to reduce scans

• Prepend bins to row IDs to increase write parallelism

Page 63: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Splits

• Pre-split or organic splits

• Going from dev to production, can ingest a representative sample, obtain split points and use them to pre-split a larger system

• Hundreds or thousands of tablets per server is ok

• Want at least one tablet per server

Page 64: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Effect of Compression

• Similar sorted keys compress well

• May need more data than you think to auto-split

Page 65: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Inserts are fast 10s of thousands per second per

machine

Page 66: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Updates *can* be …

Page 67: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Update Types

• Overwrite

• Combine

• Complex

Page 68: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Update - Overwrite

• Performance same as insert

• Ignore (don’t read) existing value

• Accumulo’s Versioning Iterator does the overwrite

Page 69: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Update - OverwriteRowID Col Value

af362de4 name Annie

af362de4 age 32

af362de4 account $25

c48e2ade name Joe

c48e2ade age 59

e2e4dac4 name Bob

e2e4dac4 age 43

e2e4dac4 account $30

userB:age -> 34

Page 70: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Update - OverwriteRowID Col Value

af362de4 name Annie

af362de4 age 34

af362de4 account $25

c48e2ade name Joe

c48e2ade age 59

e2e4dac4 name Bob

e2e4dac4 age 43

e2e4dac4 account $30

userB:age -> 34

Page 71: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Update - Combine

• Things like X = X + 1

• Normally one would have to read the old value to do this, but Accumulo Iterators allow multiple inserts to be combined at scan time, or compaction

• Performance is same as inserts

Page 72: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Update - CombineRowID Col Value

af362de4 name Annie

af362de4 age 34

af362de4 account $25

c48e2ade name Joe

c48e2ade age 59

e2e4dac4 name Bob

e2e4dac4 age 43

e2e4dac4 account $30

userB:account -> +10

Page 73: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Update - CombineRowID Col Value

af362de4 name Annie

af362de4 age 34

af362de4 account $25

af362de4 account $10

c48e2ade name Joe

c48e2ade age 59

e2e4dac4 name Bob

e2e4dac4 age 43

e2e4dac4 account $30

userB:account -> +10

Page 74: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Update - CombineRowID Col Value

af362de4 name Annie

af362de4 age 34

af362de4 account $25

af362de4 account $10

c48e2ade name Joe

c48e2ade age 59

e2e4dac4 name Bob

e2e4dac4 age 43

e2e4dac4 account $30

getAccount(userB) $35

Page 75: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Update - Combine

After compaction

RowID Col Value

af362de4 name Annie

af362de4 age 34

af362de4 account $35

c48e2ade name Joe

c48e2ade age 59

e2e4dac4 name Bob

e2e4dac4 age 43

e2e4dac4 account $30

Page 76: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Update - Complex

• Some updates require looking at more data than Iterators have access to - such as multiple rows

• These require reading the data out in order to write the new value

• Performance will be much slower

Page 77: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Update - ComplexuserC:account = getBalance(userA) + getBalance(userB)

RowID Col Value

af362de4 name Annie

af362de4 age 34

af362de4 account $35

c48e2ade name Joe

c48e2ade age 59

c48e2ade account $40

e2e4dac4 name Bob

e2e4dac4 age 43

e2e4dac4 account $30

35+30 = 65

Page 78: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Update - ComplexuserC:account = getBalance(userA) + getBalance(userB)

RowID Col Value

af362de4 name Annie

af362de4 age 34

af362de4 account $35

c48e2ade name Joe

c48e2ade age 59

c48e2ade account $65

e2e4dac4 name Bob

e2e4dac4 age 43

e2e4dac4 account $30

35+30 = 65

Page 79: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Planning a Larger-Scale Cluster 102 - 104

Page 80: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Storage vs Ingest

1

1000

1000000

10 100 1000 10000

Ingest Rate 1x1TB 12x3TB

120,000

12,000

1,200

120

10,000

1,000

100

10 Stor

age

Tera

byte

s

Milli

ons

of E

ntrie

s pe

r sec

ond

Page 81: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Model for Ingest Rates

A = 0.85log2N * N * S

N - Number of machines S - Single Server throughput (entries / second) A - Aggregate Cluster throughput (entries / second)

Expect 85% increase in write rate when doubling the size of the cluster

Page 82: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Estimating Machines Required

N = 2 (log2(A/S) / 0.7655347)

N - Number of machines S - Single Server throughput (entries / second) A - Target Aggregate throughput (entries / second)

Expect 85% increase in write rate when doubling the size of the cluster

Page 83: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Predicted Cluster SizesN

umbe

r of M

achi

nes

0

3000

6000

9000

12000

Millions of Entries per Second

0 150 300 450 600

Page 84: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

100 Machines 102

Page 85: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Multiple racks

Page 86: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

10 TB RAM

Page 87: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

100 TB - 1PB Disk

Page 88: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Some hardware failures in the first week

(burn in)

Page 89: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Expect 3 failed HDs in first 3 mo

Page 90: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Another 4 within the first year

http://static.googleusercontent.com/media/research.google.com/en/us/archive/disk_failures.pdf

Page 91: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Can process the 1000 Genomes data set

260 TB

www.1000genomes.org

Page 92: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Can store and index the Common Crawl Corpus

!

2.8 Billion web pages 541 TB

commoncrawl.org

Page 93: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

One year of Twitter 182 trillion tweets

483 TB

http://www.sec.gov/Archives/edgar/data/1418091/000119312513390321/d564001ds1.htm

Page 94: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Deploying an ApplicationTablet ServersClientsUsers

Page 95: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

May not see the affect of writing to disk for a while

Page 96: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

1000 machines 103

Page 97: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Multiple rows of racks

Page 98: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

100 TB RAM

Page 99: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

1-10 PB Disk

Page 100: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Hardware failure is a regular occurrence

Page 101: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Hard drive failure about every 5 days (average).

!

Will be skewed towards beginning of the year

Page 102: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Can traverse the ‘brain graph’ 70 trillion edges, 1 PB

http://www.pdl.cmu.edu/SDI/2013/slides/big_graph_nsa_rd_2013_56002v1.pdf

Page 103: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Facebook Graph 1s of PB

http://www-conf.slac.stanford.edu/xldb2012/talks/xldb2012_wed_1105_DhrubaBorthakur.pdf

Page 104: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Netflix Video Master Copies 3.14 PB

http://www.businessweek.com/articles/2013-05-09/netflix-reed-hastings-survive-missteps-to-join-silicon-valleys-elite

Page 105: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

World of Warcraft Backend Storage 1.3 PB

http://www.datacenterknowledge.com/archives/2009/11/25/wows-back-end-10-data-centers-75000-cores/

Page 106: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Webpages, live on the Internet 14.3 Trillion

http://www.factshunt.com/2014/01/total-number-of-websites-size-of.html

Page 107: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Things like the difference between two compression algorithms start

to make a big difference

Page 108: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Use range compactions to affect changes on portions of table

Page 109: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Lay off Zookeeper

Page 110: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Watch Garbage Collector and Namenode ops

Page 111: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Garbage Collection > 5 minutes?

Page 112: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Start thinking about NameNode Federation

Page 113: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Accumulo 1.6

Page 114: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Multiple NameNodes

Accumulo

Namenode Namenode

DataNodesDataNodes

Multiple HDFS Clusters

Page 115: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Multiple NameNodes

Accumulo

DataNodes

Multiple NameNodes, shared DataNodes (Federation. Requires Hadoop 2.0)

Namenode Namenode

Page 116: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

More Namenodes = higher risk of one going down.

!

Can use HA Namenodes in conjunction w/ Federation

Page 117: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

10,000 machines 104

Page 118: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

You, my friend, are here to kick a** and chew bubble gum

Page 119: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

1 PB RAM

Page 120: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

10-100 PB Disk

Page 121: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

1 hardware failure every hour on average

Page 122: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Entire Internet Archive 15 PB

http://www.motherjones.com/media/2014/05/internet-archive-wayback-machine-brewster-kahle

Page 123: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

A year’s worth of data from the Large Hadron Collider

15 PB

http://home.web.cern.ch/about/computing

Page 124: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

0.1% of all Internet traffic in 2013 43.6 PB

http://www.factshunt.com/2014/01/total-number-of-websites-size-of.html

Page 125: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Facebook Messaging Data 10s of PB

http://www-conf.slac.stanford.edu/xldb2012/talks/xldb2012_wed_1105_DhrubaBorthakur.pdf

Page 126: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Facebook Photos 240 billion

High 10s of PB

http://www-conf.slac.stanford.edu/xldb2012/talks/xldb2012_wed_1105_DhrubaBorthakur.pdf

Page 127: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Must use multiple NameNodes

Page 128: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Can tune back heartbeats, periodicity of central processes in

general

Page 129: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Can combine multiple PB data sets

Page 130: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Up to 10 quadrillion entries in a single table

Page 131: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

While maintaining sub-second lookup times

Page 132: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Only with Accumulo 1.6

Page 133: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Dealing with data over time

Page 134: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Data Over Time - Patterns

• Initial Load

• Increasing Velocity

• Focus on Recency

• Historical Summaries

Page 135: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Initial Load

• Get a pile of old data into Accumulo fast

• Latency not important (data is old)

• Throughput critical

Page 136: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Bulk Load RFiles

Page 137: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Bulk Loading

MapReduce

RFiles Accumulo

Page 138: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Increasing velocity

Page 139: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

If your data isn’t big today, wait a little while

Page 140: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Accumulo scales up dynamically, online. No downtime

Page 141: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

The first scale, ‘can change size’

Page 142: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Scaling UpClients

Accumulo

HDFS

3 physical servers Each running

a Tablet Server process and a Data Node process

Page 143: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Scaling UpClients

Accumulo

HDFS Start 3 new Tablet Server procs

3 new Data node processes

Page 144: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Scaling UpClients

Accumulo

HDFS master immediately assigns tablets

Page 145: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Scaling UpClients

Accumulo

HDFS

Clients immediately begin querying new

Tablet Servers

Page 146: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Scaling UpClients

Accumulo

HDFS

new Tablet Servers read data from old Data nodes

Page 147: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Scaling UpClients

Accumulo

HDFS

new Tablet Servers write data to new Data Nodes

Page 148: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Never really seen anyone do this

Page 149: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Except myself

Page 150: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

20 machines in Amazon EC2

Page 151: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

to 400 machines

Page 152: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

all during the same MapReduce job reading data out of Accumulo, summarizing, and writing back

Page 153: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Scaled back down to 20 machines when done

Page 154: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Just killed Tablet Servers

Page 155: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Decommissioned Data Nodes for safe data consolidation to

remaining 20 nodes

Page 156: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Other ways to go from 10x to 10x+1

Page 157: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Accumulo Table Export

Page 158: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

followed by HDFS DistCP to new cluster

Page 159: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Maybe new replication feature

Page 160: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Newer Data is Read more Often

Page 161: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Accumulo keeps newly written data in memory

Page 162: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Block Cache can keep recently queried data in memory

Page 163: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Combining Iterators make maintaining summaries of large

amounts of raw events easy

Page 164: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Reduces storage burden

Page 165: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Historical Summaries

0

2000

4000

6000

8000

April May June July

Unique Entities Stored Raw Events Processed

Page 166: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Age-off iterator can automatically remove data over a certain age

Page 167: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

IBM estimates 2.5 exabytes of data is created every day

http://www-01.ibm.com/software/data/bigdata/what-is-big-data.html

Page 168: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

90% of available data created in last 2 years

http://www-01.ibm.com/software/data/bigdata/what-is-big-data.html

Page 169: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

25 new 10k node Accumulo clusters per day

Page 170: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Accumulo is doing it’s part to get in front of the big data trend

Page 171: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

Questions ?

Page 172: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters

@aaroncordova