replayable bigdata for multicore processing and statistically rigid sketching

.

Hadoop/MapReduce Problems

M.Zhanikeev -- [email protected] -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 2/23...

2/23

.

Hadoop/MapReduce Problems

• upper limit on throughput, both in HDFS and MapReduce jobs 06 09

• one machine is not so bad anymore: good RAM, multicore 09

• MapReduce is key-value hashes only, very restrictive• MapReduce performs badly under heterogeneous workloads• Small file problem is hard to solve in Hadoop 10

• ...

06 K.Shvachko "HDFS Scalability: the Limits to Growth" the Magazine of USENIX, vol.35, no.2 (2012)

09 A.Rowstron+4 "Nobody ever got fired for using Hadoop on a cluster" 1st HTCDP (2012)


3/23

.

Hadoop/MapReduce Upgrades

• solutions for geterogeneous jobs 17

• R for statistical processing on top of HDFS shards 18

• searchable shards -- HBase + Lucene 19

• .... not enough

17 A.Rasooli+1 "COSHH: A Classification and Optimization based Scheduler for Heterogeneous Hadoop..." McMaster (2013)

18 S.Das+5 "Ricardo: Integrating R and Hadoop" SIGMOD (2010)

19 X.Gao+2 "Experimenting with Lucene Index on HBase in an HPC Environment" 1st HPCDB (2012)


4/23

.

Example : Superspreaders


5/23

.

Example Problem : Superspreaders.A Superspreader.....

.

... is a one-to-many (o2m) traffic artifact where one host tries to contactmany other hosts within short timespan

• reverse of Superspreader is a Flash Crowd -- same algorithm

• many-to-many (m2m, groups) is even more complex

• a known problem 16

.QUIZ..

.How to detect superspreaders using MapReduce? Any ideas?

16 S.Venkataraman+3 "New Streaming Algorithms for Fast Detection of Superspreaders" NDSS (2005)


6/23

.

Superspreaders: (1) raw packets

• time, sip, sport, dip, dport, psize, protocol -- the usual packet tuple

• convert into text for MapReduce jobs


7/23

.

Superspreaders: (2) 1st MapReduce

.Step 1..

....is to extract unique sip-dip pairs

• 3rd column is count (took word counting forbasis)

• ordered by sip• takes time because needs to process all thedata


8/23

.

Superspreaders: (3) 2nd MapReduce

.Step 2..

.

... is to count unique sips in sip-dippairs

• based on the output of the 1st job• faster because data is relatively small


9/23

.

Superspreaders: Problem NOT Solved

• no time/sequence in MapReduce, no way to process data in a timewindow

• do not know what to discard (tail, small counts) midway until all data isaggregated -- ineffective for large datasets

• ...


10/23

.

Proposal: TABID + Sketches


11/23

.

TABID: Time-Aware BIg Data

• TABID: acronym of Time Aware BIg Data• the grain is larger then key-value store, but better than HDFS shards 19

KV Store

Hadoop (HDFS) and

MapReduce

TABID Time-Aware Big Data (this demo)

HDFS +

Lucene Index

19 X.Gao+2 "Experimenting with Lucene Index on HBase in an HPC Environment" 1st HPCDB (2012)


12/23

.

Sketches (Data Streaming).Data Streaming..

.

... is a new concept where data is processed in realtime withoutusing any storage

• a relatively recent but well defined and mathematically/statisticallyformulated 11

• many known interesting algorithms 12

• algorithms for specific problems 14 15 16

• main features: space efficiency, statistical rigidity (informationtheory), speed

11 S.Muthukrishnan "Data Streams: Algorithms and Applications" Foundations and Trends... (2005)

12 M.Sung+3 "Scalable and Efficient Data Streaming Algorithms for Detecting ...." ICDEW (2006)


13/23

.

Hadoop/MapReduce.Hadoop is.....

.

...a platform where your softwaremeets with data shards

One Physical Machine (1 shard)

file A file B file C …

Hadoop Space

Manager

Hadoop Job (your code) Hadoop Job (your code) Hadoop Job (your code) Hadoop Job (your code)

many many

Name Server(s)

Client Machine

Hadoop Client

Your Code

You

Start Use Deploy

Find Read/parse

• shards are distributed acrossthe cluster

• nameserver is thebottleneck

• fairness problems arewhen heavy and light jobs runtogether at the same shard


14/23

.

TABID/Sketches.TABID is.....

.

...an API that helps your jobs get access todata in its natural time order

One Physical Machine (1 shard)

Timeline

Sub-Store

Store

TABID Node

TABID Manager

Client Machine

TABID Client

Your Sketcher

You

Start Use

Schedule

Multicore Replay

• data shards are downloadedand replayed

• jobs run on multicore• jobs can use any datastructure (well beyondkey-value)

• jobs are data streamingsketches -- can be selectedfrom a library


15/23

.

TABID: BigData Replay

Core 1

Core 1

Core X

TABID Manager

Now(replay)

….

Time-Aligned Big Data Cursor

Time Direction

One Sketch One Sketch One Sketch Start End End End

Read/prepare

Shared Memory

Start


16/23

.

TABID: Lockfree Shared Memory

• lockfree shared memory is a new branch of parallelprocessing

• locks are bad for multicore (memory fences 20)

• a generic lockfree design in 05 and software implementation in 23

• some attempts to apply multicore to MapReduce 21 22

20 M.Aldinucci+2 "FastFlow: Efficient Parallel Streaming Applications on Multi-core" Universita di Pisa (2009)

23 current "MCoreMemory project page" https://github.com/maratishe/mcorememory (myself)


17/23

.

Wrapup (Problem→Solution)

1. MapReduce is about counting, no rich datatypes

◦ SOLVED → any datatype, stored as JSON2. MapReduce has no solution for heterogeneous jobs

◦ SOLVED→ TABID optimizes jobs-to-core mapping (bin packing)

3. MapReduce has accountability problem because clients make theirown jobs

◦ SOLVED → a library of sketches based on known datastreaming algorithms

4. ... many smaller solutions


18/23

.

Hadoop vs Tabid DEMO (today)• working Hadoop and TABID platforms

• can play with various configurations on the spot

.

Replayable BigData for MulticoreProcessing and Stat... Rigid Sketching

Marat Zhanikeev – [email protected] – maratishe.github.io – http://bit.do/marat141105.. 1/1


19/23

.

That’s all, thank you ...


20/23

.

Performance under Distributions• comparative efficiency (right) and job-to-core mapping efficiency(left)

• all under heterogenious (variance) job distributions

0 200 400 600 800 1000Number of Sketches

3.8

4

4.2

4.4

4.6

4.8

5

5.2

5.4

Log

of S

ketc

hbyt

e Ra

tio (H

AD

OO

P/TA

BID

)

<10,1000,0><10,1000,0.7>

<10,1000,0.3>

<10,1000,0.1>

<10,1000,0.05>

<10,1000,0.01>

Tuples: <min lifespan, max lifespan, exponent>

More longer lifespans

0 200 400 600 800 1000Number of Sketches

3.15

3.85

4.55

5.25

5.95

6.65

Log

of M

ax T

ABI

D O

verh

ead

(per

cor

e) <100,10000,0>

<100,10000,0.7><100,10000,0.3>

<100,10000,0.1>

<100,10000,0.05>

<100,10000,0.01>

Tuples: <min overhead, max overhead, exponent>Mostlylarge overhead


21/23

.

Shared Memory : Logic

Shared Memory

IDLE

IDLE

Cursor Time

Sketch time ...

Data Stream …

All

sketches

Ring buffer

Process Data

Set Sketch Time

Add data

Sketch started

Sketch Manager Monitor cursor

Cursor moved

Each item

Cursor = sketch time End

of life

Global start

Wait for all sketches

All sketches caught up

End of data

• coursor written bymanager and read by jobs oncores

• only manager writes, cores only

read -- lockfreedesign

• zero collissions guaranteed byAPI


22/23

.

Related Subjects• rigid statistics in traffic analysis

◦ QoS of user communities 01, dynamic network management, etc.

• isn't BigData replay a bottleneck?◦ not really, circuits for bulk transfer 02 or multisource aggregation 03 can support very

high throughputs◦ MapReduce has several bottlenecks: namespace lookup, HDFS, etc. -- see 06

for details

• Why BigData Replay? -- it's a new ecosystem◦ bigdata hoarders announce replays, openly collect public jobs until a deadline, open

outcome◦ one good solution for bigdata → opendata innitiative

01 myself+0 "A holistic community-based architecture for measuring end-to-end QoS at data centres" IJCSE (2014)

02 myself+0 "Circuit Emulation for Big Data Transfers in Clouds" Networking for Big Data, Wiley (in print) (2015)

03 myself+0 "Multi-Source Stream Aggregation in the Cloud" Wiley (2014)

06 K.Shvachko "HDFS Scalability: the Limits to Growth" the Magazine of USENIX, vol.35, no.2 (2012)


23/23

.

[01] myself+0 (2014)A holistic community-based architecture for measuring end-to-end QoS at datacentresIJCSE

[02] myself+0 (2015)Circuit Emulation for Big Data Transfers in CloudsNetworking for Big Data, Wiley (in print)

[03] myself+0 (2014)Multi-Source Stream Aggregation in the CloudWiley

[04] myself+0 (2014)Optimizing Virtual Machine Migration for Energy-Efficient CloudsIEICEJ

[05] myself+0 (2014)A lock-free shared memory design for high-throughput multicore packet trafficcapture


23/23

.

IJNM

[06] K.Shvachko (2012)HDFS Scalability: the Limits to Growththe Magazine of USENIX, vol.35, no.2

[07] Y.Chen+3 (2011)The Case for Evaluating MapReduce Performance Using Workload SuitesMASCOTS

[08] Z.Ren+4 (2012)Workload Characterization on a Production Hadoop Cluster...IEEE Workload...

[09] A.Rowstron+4 (2012)Nobody ever got fired for using Hadoop on a cluster1st HTCDP

[10] (current)Small File Problem in Hadoop (blog)http://amilaparanawithana.blogspot.jp/2012


23/23

.

[11] S.Muthukrishnan (2005)Data Streams: Algorithms and ApplicationsFoundations and Trends...

[12] M.Sung+3 (2006)Scalable and Efficient Data Streaming Algorithms for Detecting ....ICDEW

[13] Z.Bar-Yossef+2 (2002)...streaming algorithms, with an application to counting triangles in graphsACM SODA

[14] M.Charikar+2 (2002)Finding frequent items in data streams29th International Colloquium on Automata...

[15] M.Datar+3 (2002)Maintaining stream statistics over sliding windowsSIAM

[16] S.Venkataraman+3 (2005)M.Zhanikeev -- [email protected] -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 23/23

...

23/23

.

New Streaming Algorithms for Fast Detection of SuperspreadersNDSS

[17] A.Rasooli+1 (2013)COSHH: A Classification and Optimization based Scheduler for HeterogeneousHadoop...McMaster

[18] S.Das+5 (2010)Ricardo: Integrating R and HadoopSIGMOD

[19] X.Gao+2 (2012)Experimenting with Lucene Index on HBase in an HPC Environment1st HPCDB

[20] M.Aldinucci+2 (2009)FastFlow: Efficient Parallel Streaming Applications on Multi-coreUniversita di Pisa

[21] R.Brightwell (2008)


23/23

.

Workshop on Managed Many-Core Systems1st Workshop ...Many-Core Systems

[22] R.Chen+2 (2010)Tiled-MapReduce: Optimizing Resource Usages of Data-parallel... with Tiling19th PACT

[23] current (myself)MCoreMemory project pagehttps://github.com/maratishe/mcorememory


23/23

replayable bigdata for multicore processing and statistically rigid sketching

Technology

multicore processing

replayable bigdata

statistical processing

restrictive mapreduce

mapreduce jobsm

nd mapreduce

heterogeneous hadoop

example problem