agenda project discussion modeling critical sections in amdahl's law and its implications for...

18
Agenda Project discussion Modeling Critical Sections in Amdahl's Law and its Implications for Multicore Design, S. Eyerman, L. Eeckhout, ISCA'10 [pdf ] Benchmarking guidelines Regular vs. irregular parallel applications

Upload: amanda-day

Post on 12-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Agenda Project discussion Modeling Critical Sections in Amdahl's Law and its Implications for Multicore Design, S. Eyerman, L. Eeckhout, ISCA'10 pdf

Agenda

• Project discussion• Modeling Critical Sections in Amdahl's Law and

its Implications for Multicore Design, S. Eyerman, L. Eeckhout, ISCA'10 [pdf]

• Benchmarking guidelines

• Regular vs. irregular parallel applications

Page 2: Agenda Project discussion Modeling Critical Sections in Amdahl's Law and its Implications for Multicore Design, S. Eyerman, L. Eeckhout, ISCA'10 pdf

Last time: Amdahl’s law

Under what assumptions?

Speedup =1

+1 - F

1

F

N

1-F

F

• Code is infinitely paralelizable

• No parallelization overheads

• No synchronization

Page 3: Agenda Project discussion Modeling Critical Sections in Amdahl's Law and its Implications for Multicore Design, S. Eyerman, L. Eeckhout, ISCA'10 pdf

Assuming multiple BCEs. Q: How to design a multicore for maximum speedup

• Assumed Perf(R) = square root of R• Two problems

– symmetric / asymmetric multicore chips– Area allocations

(symmetric)Sixteen 1-BCE cores

(symmetric)Four 4-BCE cores

(symmetric)One 16-BCE core

Page 4: Agenda Project discussion Modeling Critical Sections in Amdahl's Law and its Implications for Multicore Design, S. Eyerman, L. Eeckhout, ISCA'10 pdf

For Asymmetric Multicore Chips

• Serial Fraction 1-F same, so time = (1 – F) / Perf(R)

• Parallel Fraction F– One core at rate Perf(R)– N-R cores at rate 1– Parallel time = F / (Perf(R) + N - R)

• Therefore, w.r.t. one base core:

Asymmetric Speedup =1

+1 - F

Perf(R)

F

Perf(R) + N - R

Page 5: Agenda Project discussion Modeling Critical Sections in Amdahl's Law and its Implications for Multicore Design, S. Eyerman, L. Eeckhout, ISCA'10 pdf

[for 256 BCEs]

(256 cores) (253 cores) (193 cores) (1 core)

(241 cores)

Page 6: Agenda Project discussion Modeling Critical Sections in Amdahl's Law and its Implications for Multicore Design, S. Eyerman, L. Eeckhout, ISCA'10 pdf

Amdahl assumptions • Code is infinitely paralelizable• No parallelization overheads• No synchronization

– Add synchronization. Randomly entered (?!)

fseq + fpar = 1 fseq + fpar,ncs + fpar,cs = 1

Page 7: Agenda Project discussion Modeling Critical Sections in Amdahl's Law and its Implications for Multicore Design, S. Eyerman, L. Eeckhout, ISCA'10 pdf

fseq + fpar,ncs + fpar,cs = 1

Average time in critical sections

Paper also derives an estimate for max time in critical sections

Page 8: Agenda Project discussion Modeling Critical Sections in Amdahl's Law and its Implications for Multicore Design, S. Eyerman, L. Eeckhout, ISCA'10 pdf

fseq

fpar,csPcsPctn

fpar,cs(1-PcsPctn)/N

fpar,ncs / N

Page 9: Agenda Project discussion Modeling Critical Sections in Amdahl's Law and its Implications for Multicore Design, S. Eyerman, L. Eeckhout, ISCA'10 pdf

Speedup for an asymmetric processor as a function of the big core size (b) and small core size (s) for different contention rates, assuming 256 BCEs. Fraction spent in sequential code 1%.

Page 10: Agenda Project discussion Modeling Critical Sections in Amdahl's Law and its Implications for Multicore Design, S. Eyerman, L. Eeckhout, ISCA'10 pdf
Page 11: Agenda Project discussion Modeling Critical Sections in Amdahl's Law and its Implications for Multicore Design, S. Eyerman, L. Eeckhout, ISCA'10 pdf

Design space exploration across symmetric, asymmetric and ACS multicore processors

Varying the fraction of the time spent in critical sections and their contention rates.

Fraction spent in sequential code equals 1%

ACS = Accelerated critical section

Page 12: Agenda Project discussion Modeling Critical Sections in Amdahl's Law and its Implications for Multicore Design, S. Eyerman, L. Eeckhout, ISCA'10 pdf

agenda

• Project discussion• Modeling Critical Sections in Amdahl's Law and

its Implications for Multicore Design, S. Eyerman, L. Eeckhout, ISCA'10 [pdf]

• 12 ways to fool the masses

• Regular vs. irregular parallel applications

Page 13: Agenda Project discussion Modeling Critical Sections in Amdahl's Law and its Implications for Multicore Design, S. Eyerman, L. Eeckhout, ISCA'10 pdf

If you were plowing a field, whichwould you rather use?

Two oxen, or 1024 chickens?(Attributed to S. Cray)

Page 14: Agenda Project discussion Modeling Critical Sections in Amdahl's Law and its Implications for Multicore Design, S. Eyerman, L. Eeckhout, ISCA'10 pdf

David H. Bailey, “Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Computers”, Supercomputing Review, August 1991,

1. Quote only 32-bit performance results, not 64-bit results.2. Present performance figures for an inner kernel, and then represent these

figures as the performance of the entire application.3. Quietly employ assembly code and other low-level language constructs.4. Scale up the problem size with the number of processors, but omit any

mention of this fact.5. Quote performance results projected to a full system.6. Compare your results against scalar, unoptimized code on Crays.7. When direct run time comparisons are required, compare with an old code

on an obsolete system.8. If MFLOPS rates must be quoted, base the operation count on the parallel

implementation, not on the best sequential implementation.9. Quote performance in terms of processor utilization, parallel speedups or

MFLOPS per dollar.10. Mutilate the algorithm used in the parallel implementation to match the

architecture.11. Measure parallel run times on a dedicated system, but measure

conventional run times in a busy environment.12 If all else fails, show pretty pictures and animated videos, and don't talk

about performance.

Page 15: Agenda Project discussion Modeling Critical Sections in Amdahl's Law and its Implications for Multicore Design, S. Eyerman, L. Eeckhout, ISCA'10 pdf

Rodamap

• Project discussion• Modeling Critical Sections in Amdahl's Law and

its Implications for Multicore Design, S. Eyerman, L. Eeckhout, ISCA'10 [pdf]

• 12 ways to fool the masses

• Regular vs. irregular parallel applications

Page 16: Agenda Project discussion Modeling Critical Sections in Amdahl's Law and its Implications for Multicore Design, S. Eyerman, L. Eeckhout, ISCA'10 pdf

16

Definitions

• Regular applications– key data structures are

• vectors • dense matrices

– simple access patterns • (eg) array indices are affine functions of for-loop indices

– examples: • MMM, Cholesky & LU factorizations, stencil codes, FFT,…

• Irregular applications– key data structures are

• lists, priority queues • trees, DAGs, graphs • usually implemented using pointers or references

– complex access patterns– examples: see next slide

Page 17: Agenda Project discussion Modeling Critical Sections in Amdahl's Law and its Implications for Multicore Design, S. Eyerman, L. Eeckhout, ISCA'10 pdf

17

Regular application example: Stencil computation

• (e.g.,) Finite-difference method for solving pde’s– discrete representation of domain: grid

• Values at interior points are updated using values at neighbors

– values at boundary points are fixed • Data structure:

– dense arrays• Parallelism:

– values at next time step can be computed simultaneously– parallelism is not dependent on runtime values

• Compiler can find the parallelism– spatial loops are DO-ALL loops

//Jacobi iteration with 5-point stencil//initialize array Afor time = 1, nsteps for <i,j> in [2,n-1]x[2,n-1] temp(i,j)=0.25*(A(i-1,j)+A(i+1,j)+A(i,j-1)+A(i,j+1)) for <i,j> in [2,n-1]x[2,n-1]: A(i,j) = temp(i,j)

Jacobi iteration, 5-point stencil

A temp

tn tn+1

Page 18: Agenda Project discussion Modeling Critical Sections in Amdahl's Law and its Implications for Multicore Design, S. Eyerman, L. Eeckhout, ISCA'10 pdf

18

Delaunay Mesh Refinement• Iterative refinement to remove badly

shaped triangles:while there are bad triangles do {

Pick a bad triangle;Find its cavity;Retriangulate cavity; // may create new bad

triangles}

• Don’t-care non-determinism:– final mesh depends on order in which

bad triangles are processed– applications do not care which mesh is

produced• Data structure:

– graph in which nodes represent triangles and edges represent triangle adjacencies

• Parallelism: – bad triangles with cavities that do not

overlap can be processed in parallel– parallelism is dependent on runtime

values• compilers cannot find this parallelism