quartet: harmonizing task scheduling and caching for ... · analyses for the masses data collection...

Quartet: Harmonizing task scheduling and caching for cluster computing

Francis Deslauriers, Peter McCormick,George Amvrosiadis, Ashvin Goel &

Angela Demke Brown

June 23, 2016

Analyses for the masses

• Data collection is cheap, so datasets are growing exponentially

• Cluster computing makes it easy to analyze these datasets, enabling:◦ Queries on entire datasets◦ Analysts running queries on the same corpus◦ Tuning queries

2 of 18

Data is often re-accessed

• Result is many queries running on the same large datasets• Leads to significant data reuse

1

10

100

1,000

10,000

100,000

1 100 10,000 1,000,000

File

access f

req

uency

Input file rank by descending access frequency

CC-b

CC-c

CC-d

CC-e

FB-2010

Facebook, and Cloudera customers[VLDB’12]

0.0 0.2 0.4 0.6 0.8 1.0

Cumulative distribution

of input paths

0.0

0.2

0.4

0.6

0.8

1.0

CD

F o

f path

accesses

OpenCloud

M45

WebMining

CMU academicclusters [VLDB’13]

3 of 18

Data reuse does not help

• We should expect data reuse improves performance due to caching

• We find that jobs do not see the benefits of reuse

4 of 18

Example: Repeating a Hadoop job

5 of 18

Missed Opportunities

• Working sets don’t fit in the page cache

• Jobs consume data independently of one another

6 of 18

Solution

• Key idea◦ Reorder work to first consume cached data◦ Jobs are made of small tasks with no ordering requirements

• Challenges◦ Cache visibility: Jobs need to know what data is cached on the different nodes◦ Task reordering: Jobs need to reorder their tasks based on this knowledge

• Our Quartet system addresses both these challenges

7 of 18

Challenge 1: Cache Visibility

• Datanodes collect information about HDFSblocks that reside in memory

◦ Requires the Duet kernel module that informsapplications when pages are cached or evicted

• Nodes send this information periodically to theQuartet Manager◦ Changes to the number of resident pages of each

block

8 of 18

Challenge 2: Task Reordering

1. Application Master registers blocks of interestwith the Quartet manager

2. Quartet manager informs Application Masterabout cached blocks

3. Application Master prioritizes and places tasksbased on block residency information

9 of 18

Example: Repeating a Hadoop job

10 of 18

Experiments

• Spark and Hadoop implementations

• 24 nodes with a total of 384 GB of memory

• Different input sizes:◦ Smaller than physical memory (256 GB)◦ Slightly larger (512 GB)◦ Approximately 3 times (1024 GB)

• 3 replicas per block

11 of 18

Results - Cache Hit Rate of the second job

256GB 512GB 1024GB 256GB 512GB 1024GB 0

20

40

60

80

100

Hadoop Spark C

ache

hit

rate

(%

)

Job size

Baseline Quartet

112GB

240GB

22GB

262GB

1GB

249GB

107GB

251GB

3GB

287GB

0GB

284GB

12 of 18

Results - Runtime reduction of the second job

256GB 512GB 1024GB 256GB 512GB 1024GB 0

20

40

60

80

100

Hadoop Spark N

orm

aliz

ed r

untim

e of

sec

ond

job

(%)

Job size

Baseline Quartet

13 of 18

Conclusions

• Observation: Workloads show significant amount of reuse

• Problem: Jobs are unable to take advantage of this reuse

• Solution:◦ Add visibility on what is cached in each of the cluster nodes◦ Reorder tasks to take advantage of this cached data

• Future work: More realistic workloads and scalability

14 of 18

Conclusion

Thank you!

15 of 18

Conclusion

Questions?

16 of 18

Network Overhead

• Watchers updates are aggregated per HDFS blocks (128-256MB)

• Upper bound is the storage bandwidth

• Manager notifications is proportional to the size of the input and hardware◦ 10-100 KB/s in our experiments

17 of 18

Related Work

HDFS Cache Manager

• Requires manual changes in case of change of popularity

• Can’t be used when input is larger than memory

PACMan

• Avoiding stragglers in single wave of Mappers

• Modify cache eviction policy to ensure that entire computation stages have memorylocality

18 of 18

quartet: harmonizing task scheduling and caching for ... · analyses for the masses data collection...

Documents