mapreduce online tyson condie uc berkeley slides by kaixiang mo [email protected]

23
MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO [email protected]

Upload: laurel-needs

Post on 31-Mar-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO kxmo@cse.ust.hk

MapReduce Online

Tyson CondieUC Berkeley

Slides by Kaixiang [email protected]

Page 2: MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO kxmo@cse.ust.hk

Outline

• Background• Motivation: Block vs pipeline• Hadoop Online Prototype Model• Pipeline within job• Online Aggregation• Continuous Queries• Conclusion

Page 3: MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO kxmo@cse.ust.hk

Background

• Map Reduce system– Massive data parallelism, batch-oriented, high

through put– Fault tolerance via putting results on HDFS

• However, – Stream process: analyze streams of data– Online aggregation: used interactively

Page 4: MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO kxmo@cse.ust.hk

Motivation: Batch vs online

• Batch: – Reduce begin after all map task– High through put, high latency

• Online– Stream process is usually not fault tolerant– Lower latency

• Blocking does not fit online/stream data– Final answers only– No infinite streams

• Fault tolerance is important, how?

Page 5: MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO kxmo@cse.ust.hk

Map Reduce Job

• Map step– Parse input into words– For each word, output <word,1>

• Reduce step– For each word, list of counts– Sum counts and output <word,sum>

• Combine step (optional)– Preaggregate map output– Same with reduce

Page 6: MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO kxmo@cse.ust.hk

Map Reduce steps

• Client Submit job, master schedule mapper and reducer

• Map step, Group(sort) step, Combine step(optional),

• Commit step• Map finish

Page 7: MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO kxmo@cse.ust.hk

Map Reduce steps

• Master tell reduce the output location• Shuffle(pull data) step, Group(all sort) step,– Start too late?

• Reduce step• Reduce finish• Job finish

Page 8: MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO kxmo@cse.ust.hk

Hadoop Online Prototype(HOP)

• Major: Pipelining between operators– Data pushed from mapper to reducer– Data transfer concurrently with map/reduce

computation– Still fault tolerant

• Benefit– Low latency– Higher Utilization– Smooth network traffic

Page 9: MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO kxmo@cse.ust.hk

Performance at a Glance

• In some case, HOP can reduce job completion time by 25%.

Page 10: MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO kxmo@cse.ust.hk

Pipeline within Job

• Simple design: pipeline each record– Prevents map to group and combine– Network I/O heavy load– Map flood and bury reducer

• Revised: pipeline small sorted runs(spills)– Task thread: apply map/reduce function, buffer output– Spill thread: sort & combine buffer, spill to file– Task Tracker: handle service consumer requests

Page 11: MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO kxmo@cse.ust.hk

Utility balance control

• Mapper send early results: move computation(group&combine) from mapper to reducer.

• If reducer is fast, mapper aggressive++, mapper sort&spill--

• If reducer is slow, mapper aggressive--, mapper sort&spill++

• Halt pipeline when: backup or effective combiner• Resume pipeline by: merge&combine accumulated spill

files

Page 12: MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO kxmo@cse.ust.hk
Page 13: MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO kxmo@cse.ust.hk

Pipelined Fault tolerant(PFT)• Simple PFT design (coarse):

– Reduce treats in-progress map output as tentative– If map succeed accept its output– If map die, throw its output

• Revised PTF design (finer):– Record mapper progress, recover from latest checkpoint– Correctness: Reduce task ensure spill files are good– Map tasks recover from latest checkpoint, no redundant spill file

• Master is more busy here– Need to record progress for each map, – Need to record whether each map output is send

Page 14: MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO kxmo@cse.ust.hk

System fault tolerant

• Mapper fail– New mapper start from checkpoint and sent to

reducer• Reducer fail– All mapper resend all intermediate result. Mapper

still need to store the intermediate result on local disk, but reducer don’t have to block.

• Master fail– The system cannot survive.

Page 15: MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO kxmo@cse.ust.hk

Online aggregation

• Show snapshot of reducer result from time to time

• Show Progress(reducerget %)

Page 16: MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO kxmo@cse.ust.hk

Pipeline between jobs

• Assume we run job1, job2. job2 needs job1’s result.

• Snapshot the output of job1 and pipeline it to job2 from time to time.

• Fault tolerant:– Job1 fail: recover as before– Job2 fail: restart failed task– Both fail: job2 restart from latest snapshot

Page 17: MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO kxmo@cse.ust.hk

Continuous Queries

• Mapper: add flush API, store it locally if reducer is unavailable

• Reducer: run periodically– Wall-clock time, logical time, #input rows, etc– Fix #mapper and #reducers

• Fault tolerant:– Mapper cannot retain infinite results.– Reducer: saving checkpoint using HDFS

Page 18: MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO kxmo@cse.ust.hk

Performance Evaluation

Page 19: MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO kxmo@cse.ust.hk

Impact of #Reducer

Page 20: MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO kxmo@cse.ust.hk

Impact of #Reducer

• When #reducer is enough, faster.• When #reducer is not enough, slower.– Not able to balance workload between mapper

and reducer

Page 21: MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO kxmo@cse.ust.hk

Small vs Large block

• When using large block, HOP is faster because reducer doesn’t wait.

Page 22: MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO kxmo@cse.ust.hk

Small vs Large block

• When using small block, HOP is still faster, but advantage is smaller.

Page 23: MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO kxmo@cse.ust.hk

Discussion

• HOP improved hadoop for real time/stream process, useful with few jobs.

• Finer granularity control may make task master busy, affect scalability.

• When there is a lot of jobs, it may increase computation and decrease through put. (busier network, many overhead for master)