to share or not to share? ryan johnson nikos hardavellas, ippokratis pandis, naju mancheril, stavros...

28
To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki, Babak Falsafi PARALLEL DATA LABORATORY Carnegie Mellon University **HP LABS

Upload: marquise-coiner

Post on 14-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki,

To Share or Not to Share?

Ryan Johnson

Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli,

Anastasia Ailamaki, Babak Falsafi

PARALLEL DATA LABORATORYCarnegie Mellon University

**HP LABS

Page 2: To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki,

Ryan Johnson © Apr 18, 2023 http://www.pdl.cmu.edu/ 2

Motivation For Work Sharing

scan

join

aggregate

scan

output

StudentDept

Query: What is the highest undergraduate GPA?

aggregate

output

scan

Student

Query: What is the average GPA in the ECE dept.?

Page 3: To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki,

Ryan Johnson © Apr 18, 2023 http://www.pdl.cmu.edu/ 3

Motivation For Work Sharing• Many queries in

system• Similar requests• Redundant work

• Work Sharing• Detect

redundant work• Compute results

once and share

• Big win for I/O, uniprocessors

scan

join

aggregate

scan

output

StudentDept

aggregate

output

scan

Student

2x speedup for TPC-H queries [hariz05]

Page 4: To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki,

Ryan Johnson © Apr 18, 2023 http://www.pdl.cmu.edu/ 4

0.0

0.5

1.0

1.5

2.0

0 15 30 45Shared Queries

Sp

eed

up

du

e to

WS

1 CPU

7x

Work Sharing on Modern Hardware

8 CPU

Memory

L1

L2

core

L1

L2

core

L2

Memory

Work sharing can hurt performance!

Page 5: To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki,

Ryan Johnson © Apr 18, 2023 http://www.pdl.cmu.edu/ 5

Contributions

• Observation• Work sharing can hurt performance on parallel

hardware

• Analysis• Develop intuitive analytical model of work sharing• Identify trade-off between total work, critical path

• Application• Model-based policy outperforms static ones by

up to 6x

Page 6: To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki,

Ryan Johnson © Apr 18, 2023 http://www.pdl.cmu.edu/ 6

Outline

• Introduction• Part I: Intuition and Model• Part II: Analysis and Experiments

Page 7: To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki,

Ryan Johnson © Apr 18, 2023 http://www.pdl.cmu.edu/ 7

Challenges of Exploiting Work Sharing

• Independent execution only?• Load reduction from work sharing can be useful

• Work sharing only?• Indiscriminate application can hurt performance

• To share or not to share? • System and workload dependent • Adapt decisions at runtime

Must understand work sharing to exploit it fully

Page 8: To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki,

Ryan Johnson © Apr 18, 2023 http://www.pdl.cmu.edu/ 8

Work Sharing vs. Parallelism

Query 2

Independent Execution

Query 1

Query 2 response time

Query 1 response time

Scan

Join

Aggregate

P = 4.33

Critical Paths

Page 9: To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki,

Ryan Johnson © Apr 18, 2023 http://www.pdl.cmu.edu/ 9

Work Sharing vs. Parallelism

Query 2

Query 1

Query 2 response time

Query 1 response time

Shared Execution

Penalty

Scan

Join

Aggregate

P = 2.75

P = 4.33

Critical path now longer

Total work and critical path both important

Page 10: To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki,

Ryan Johnson © Apr 18, 2023 http://www.pdl.cmu.edu/ 10

Understanding Work Sharing

• Performance depends on two factors:

• Work sharing presents a trade-off• Reduces total work• Potentially lengthens critical path

Balance both factors or performance suffers

thCriticalPaTotalWorkfThroughput

1,

1

Page 11: To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki,

Ryan Johnson © Apr 18, 2023 http://www.pdl.cmu.edu/ 11

Basis for a Model

• “Closed” system• Consistent high load• Throughput computing• Assumed in most benchmarks• Fixed number of clients

• Little’s Law governs throughput• Higher response time = lower throughput• Total work not a direct factor!

Load reduction secondary to response time

Page 12: To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki,

Ryan Johnson © Apr 18, 2023 http://www.pdl.cmu.edu/ 12

Predicting Response Time

• Case 1: Compute-bound

• Case 2: Critical path-bound

• Larger bottleneck determines response time

stage pipeslowest at Delay max pTime

Processors Available

on UtilizatiRequested

n

uTime

Model provides u and pmax

Page 13: To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki,

Ryan Johnson © Apr 18, 2023 http://www.pdl.cmu.edu/ 13

An Analytical Model of Work Sharing

Sharing helpful when Xshared > Xalone

)(

1,

)(min),(

max mpmu

nmnmX

Throughput for m queries and n processors

Potentially worsened by work sharing

][][

TE

mXE

Improved by work sharing

U = requested utilizationPmax = longest pipe stage

Page 14: To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki,

Ryan Johnson © Apr 18, 2023 http://www.pdl.cmu.edu/ 14

Outline

• Introduction• Part I: Intuition and Model• Part II: Analysis and Experiments

Page 15: To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki,

Ryan Johnson © Apr 18, 2023 http://www.pdl.cmu.edu/ 15

Experimental Setup• Hardware

• Sun T2000 “Niagara” with 16 GB RAM• 8 cores (32 threads) • Solaris processor sets vary effective CPU count

• Cordoba• Staged DBMS• Naturally exposes work sharing• Flexible work sharing policies

• 1GB TPCH dataset• Fixed Client and CPU counts per run

Page 16: To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki,

Ryan Johnson © Apr 18, 2023 http://www.pdl.cmu.edu/ 16

Predicted vs. Measured Performance

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 15 30 45Shared Queries

Sp

eed

up

du

e to

WS 1 CPU

1 CPU model

2 CPU2 CPU model

8 CPU8 CPU model

32 CPU32 CPU model

Model Validation: TPCH Q1

Avg/max error: 5.7% / 22%

Page 17: To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki,

Ryan Johnson © Apr 18, 2023 http://www.pdl.cmu.edu/ 17

Predicted vs. Measured Performance

0

10

20

30

0 15 30 45Clients

Sp

eed

up

du

e to

WS 1 CPU

1 CPU model

2 CPU2 CPU model

8 CPU8 CPU model

32 CPU32 CPU model

Model Validation: TPCH Q4

Behavior varies with both system and workload

Page 18: To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki,

Ryan Johnson © Apr 18, 2023 http://www.pdl.cmu.edu/ 18

Exploring WS vs. Parallelism• Work sharing splits query

into three parts• Independent work

– Per-query, parallel– Total work

• Serial work– Per-query, serial– Critical path

• Shared work– Computed once– “Free” after first query

Serial - 4% Independent - 37%

Shared - 59%

Example:

Page 19: To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki,

Ryan Johnson © Apr 18, 2023 http://www.pdl.cmu.edu/ 19

0

0.5

1

1.5

2

2.5

0 8 16 24 32Shared Queries

Ben

efit

fro

m W

ork

Sh

arin

g

4

8

16

32

Exploring WS vs. Parallelism

Behavior matches previously published results

Potential Speedup

CPUs

Page 20: To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki,

Ryan Johnson © Apr 18, 2023 http://www.pdl.cmu.edu/ 20

0

0.5

1

1.5

2

2.5

0 8 16 24 32Shared Queries

Ben

efit

fro

m W

ork

Sh

arin

g

4

8

Exploring WS vs. Parallelism

Potential Speedup

CPUs

Saturated

Page 21: To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki,

Ryan Johnson © Apr 18, 2023 http://www.pdl.cmu.edu/ 21

0

0.5

1

1.5

2

2.5

0 8 16 24 32Shared Queries

Ben

efit

fro

m W

ork

Sh

arin

g

4

8

16

32

Exploring WS vs. Parallelism

Potential Speedup

CPUs

Saturated

Page 22: To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki,

Ryan Johnson © Apr 18, 2023 http://www.pdl.cmu.edu/ 22

0

0.5

1

1.5

2

2.5

0 8 16 24 32Shared Queries

Ben

efit

fro

m W

ork

Sh

arin

g

4

8

16

32

Exploring WS vs. Parallelism

More processors shift bottleneck to critical path

Saturated

Potential Speedup

CPUs

Page 23: To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki,

Ryan Johnson © Apr 18, 2023 http://www.pdl.cmu.edu/ 23

Performance Impact of Serial Work

Critical path quickly becomes major bottleneck

Impact of Serial Work (32 CPU)

0

0.5

1

1.5

2

2.5

0 10 20 30 40Shared Queries

Ben

efit

fro

m W

ork

Sh

arin

g

0%

1%

2%7%

Page 24: To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki,

Ryan Johnson © Apr 18, 2023 http://www.pdl.cmu.edu/ 24

Model-guided Work Sharing

• Integrate predictive model into Cordoba• Predict benefit of work sharing for each new query

• Consider multiple groups of queries at once• Shorter critical path, increased parallelism

• Experimental setup• Profile run with 2 clients, 2 CPUs• Extract model parameters with profiling tools• 20 clients submit mix of TPCH Q1 and Q4

Compare against always-, never-share policies

Page 25: To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki,

Ryan Johnson © Apr 18, 2023 http://www.pdl.cmu.edu/ 25

Comparison of Work Sharing Strategies

Model-based policy balances critical path and load

2 CPU

0

50

100

150

200

All Q1 50/50 All Q4

Query Ratio

Qu

eri

es/m

in

always sharemodel guidednever share

32 CPU

0

50

100

150

200

250

Qu

eri

es/m

inAll Q1 50/50 All Q4

Query Ratio

Page 26: To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki,

Ryan Johnson © Apr 18, 2023 http://www.pdl.cmu.edu/ 26

Related Work

• Many existing work sharing schemes• Identification occurs at different stages in the

query’s lifetime• All allow pipelined query execution

Materialized Views

[rouss82]

Multiple Query Optimization

[roy00]

Staged DBMS[hariz05]

Synchronized Scanning[lang07]

Schema design

Buffer Pool Access

Query compilation

Query execution

Early Late

Model describes all types of work sharing

Page 27: To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki,

Ryan Johnson © Apr 18, 2023 http://www.pdl.cmu.edu/ 27

Conclusions

• Work sharing can hurt performance• Highly parallel, memory resident machines

• Intuitive analytical model captures behavior• Trade-off between load reduction and critical path

• Model-guided work sharing highly effective• Outperforms static policies by up to 6x

http://www.cs.cmu.edu/~StagedDB/

Page 28: To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki,

Ryan Johnson © Apr 18, 2023 http://www.pdl.cmu.edu/ 28

References

•[hariz05] S. Harizopoulos, V. Shkapenyuk, and A. Ailamaki. “QPipe: A Simultaneously Pipelined Relational Query Engine.” In Proc. SIGMOD, 2005.

•[lang07] C. Lang, B. Bhattacharjee, T. Malkemus, S. Padmanabhan, and K. Wong. “Increasing Buffer-Locality for Multiple Relational Table Scans through Grouping and Throttling.” In Proc. ICDE, 2007.•[rouss82] N. Roussopoulos. “View Indexing in Relational databases.” In ACM TODS, 7(2):258-290, 1982.•[roy00] P. Roy, S. Seshadri, S. Sudarshan, and S. Bhobe. “Efficient and Extensible Algorithms for Multi Query Optimization.” In Proc. SIGMOD, 2000.

http://www.cs.cmu.edu/~StagedDB/