1 presentation at the 4 th pmeo-pds workshop benchmark measurements of current upc platforms zhang...

1

Presentation at the 4th PMEO-PDS Workshop

Benchmark Measurements of Current UPC Platforms

Zhang Zhang and Steve SeidelMichigan Technological University

Denver, Colorado 3/22/2005

2

Presentation Outline

• Background– Unified Parallel C, implementations and users.– Previous UPC performance studies.

• Experiments– Available UPC platforms– Benchmarks

• Performance measurements• Conclusions

3

UPC Overview• UPC is an extension of C for partitioned shared memory

parallel programming.– A special case of shared memory programming model.– Similar languages: Co-Array Fortran, Titanium.– UPC homepage: http://www.upc.gwu.edu

• Platforms supported:– Cray X1, Cray T3E, SGI Origin, HP AlphaServer, HP UX,

Linux clusters, IBM SP.• UPC compilers:

– Open source: MuPC, Berkeley UPC, Intrepid UPC– Commercial: HP UPC, Cray UPC

• Users: – LBNL, IDA, AHPCRC, …

4

Related UPC Performance Studies

• Performance benchmark suites– UPC_Bench (GWU)

• Synthetic microbenchmark based on the STREAM benchmark.

• Application benchmarks: Sobel edge detection, matrix multiplication, N-Queens problem

– UPC NAS Parallel Benchmarks (GWU)

• Performance monitoring– Performance analysis for HP UPC compiler (GWU)– Performance of Berkeley UPC on HP AlphaServer

(Berkeley)– Performance of Intrepid UPC on SGI Origin (GWU)

5

Benchmarking UPC Systems• Extended shared memory bandwidth microbenchmarks

to cover various reference patterns:– Scalar references: 11 access patterns– Block memory operations: 9 access patterns

• Benchmarked six combinations of available UPC compilers and platforms using both the UPC STREAM (MTU code) and the UPC NAS Parallel Benchmarks (GWU code).– Compilers: MuPC, HP UPC, Berkeley UPC and Intrepid UPC– Platforms: Myrinet Linux cluster, HP AlphaServer SC, and T3E

• The first comparison of performance for currently available UPC implementations.

• The first report on MuPC performance.

6

Benchmarks

• Synthetic benchmarks:– The STREAM microbenchmark was rewritten using UPC with

more diversities of shared memory access patterns:• Local shared read / write

• Unit stride shared read / write / copy

• Random shared read / write / copy

• Stride-n shared read / write / copy

• Block transfers with variations of source and sink affinities.

• NAS Parallel Benchmark Suite v2.4– The UPC version was developed at GWU.– Five cores: CG, EP, FT, IS and MG.– Two variations: Naïve version and Hand-tuned version.– Input size: Class A workload.

7

Local Shared References

• Intrepid UPC: performance is poor on local shared accesses.• HP UPC: cache state has significant effects on local shared accesses.

8

Remote Shared References

• HP UPC and MuPC: caches help unit stride remote shared accesses.• Intrepid UPC does the best for remote shared accesses.

9

Block Memory Operations

• HP UPC: performance is poor on certain string functions.• Intrepid UPC: low performance on all categories.

10

NPB – CG

• The only case that scales well: Berkeley UPC + optimized code.

11

NPB – EP

12

NPB – FT

• HP, Berkeley and MuPC: performance is comparable.

13

NPB – IS

• HP, Berkeley and MuPC: performance is comparable.

14

NPB – MG

• MG performance is very inconsistent.

15

Conclusions• STREAM benchmarking:

– UPC language overhead reduces performance of local shared references.

– Remote reference caching helps stride-1 accesses.– Copying between two locations with the same affinity to a

remote thread needs optimization.• NPB benchmarking:

– Some implementation failed for some benchmarks. More stable and reliable implementations are needed.

– Hand-tuning techniques (e.g. prefetching) are critical in performance.

– Berkeley UPC is the best at handling unstructured, fine-grained references.

– MuPC experience shows that it will be more rewarding to optimize remote shared references than to improve network interconnects.

16

Thank you!

For more information:

http://www.upc.mtu.edu

1 presentation at the 4 th pmeo-pds workshop benchmark measurements of current upc platforms zhang...

Documents

upc overview upc

upc version

upc homepage

intrepid upc platforms

intrepid upc commercial

cray upc users

available upc implementations

upc language overh