parallelization of the telemedicine benchmark for the xbox 360 architecture howard wong, surf-it...

11
Parallelization of the Telemedicine Benchmark for the Xbox 360 Architecture Howard Wong, SURF-IT Fellow Professor Jean-Luc Gaudiot, EECS August 29, 2008 University of California, Irvine PASCAL PASCAL: PArallel Systems and Computer Architecture Lab.

Upload: beverley-lynch

Post on 13-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallelization of the Telemedicine Benchmark for the Xbox 360 Architecture Howard Wong, SURF-IT Fellow Professor Jean-Luc Gaudiot, EECS August 29, 2008

Parallelization of the Telemedicine Benchmark for the Xbox 360

ArchitectureHoward Wong, SURF-IT Fellow

Professor Jean-Luc Gaudiot, EECSAugust 29, 2008

University of California, Irvine

PASCALPASCAL: PArallel Systems and Computer Architecture Lab.

Page 2: Parallelization of the Telemedicine Benchmark for the Xbox 360 Architecture Howard Wong, SURF-IT Fellow Professor Jean-Luc Gaudiot, EECS August 29, 2008

PASCAL: PArallel Systems & Computer Architecture Lab.

Outline

Background (Benchmark, Platform) Current Work Methodology (Compiler, Data Set) Results Conclusions Future Work

Page 3: Parallelization of the Telemedicine Benchmark for the Xbox 360 Architecture Howard Wong, SURF-IT Fellow Professor Jean-Luc Gaudiot, EECS August 29, 2008

PASCAL: PArallel Systems & Computer Architecture Lab.

Background

Why Parallel Programming? Advent of everyday multicomputers Ultimate goal: Auto-parallelization Basic concepts

− Problems− Programming primitives

Telemedicine Benchmark Platform – Xbox 360

3 Cores Graphics Engine Vector Processing

?

Work

Core 1

Core 2 Core n

Page 4: Parallelization of the Telemedicine Benchmark for the Xbox 360 Architecture Howard Wong, SURF-IT Fellow Professor Jean-Luc Gaudiot, EECS August 29, 2008

PASCAL: PArallel Systems & Computer Architecture Lab.

Current Work

Goal: Identify the parallelization process Efficiency measured in performance Performance in relation to load

POSIX threads (pthreads) and OpenMP Sorting Routines

'fallbackSort'− Making search 'brackets'

'mainSort'− Dependencies between loop iterations

Page 5: Parallelization of the Telemedicine Benchmark for the Xbox 360 Architecture Howard Wong, SURF-IT Fellow Professor Jean-Luc Gaudiot, EECS August 29, 2008

PASCAL: PArallel Systems & Computer Architecture Lab.

Methodology

Compilation gcc or g++ version 4.2

Data Sets Monkey brain image in PPM

format Derived data via netpbm

Test Platform Xbox 360 with Ubuntu Linux

Images courtesy of Neuroscience Center, UC Davis, and Joerg Meyer, Center of GRAVITY, Calit2, UC Irvine.

Page 6: Parallelization of the Telemedicine Benchmark for the Xbox 360 Architecture Howard Wong, SURF-IT Fellow Professor Jean-Luc Gaudiot, EECS August 29, 2008

PASCAL: PArallel Systems & Computer Architecture Lab.

Initial Results

0 1 2 3 4

0.000

0.500

1.000

1.500

2.000

2.500

3.000

3.500

Speedup versus Number of ThreadsCompression of brains.ppm; Compared to bzip2

bzip2modLinearLinear

No. of Threads

Sp

ee

du

p

Page 7: Parallelization of the Telemedicine Benchmark for the Xbox 360 Architecture Howard Wong, SURF-IT Fellow Professor Jean-Luc Gaudiot, EECS August 29, 2008

PASCAL: PArallel Systems & Computer Architecture Lab.

Analysis

Possible thread contention 'bitmap' of data as former optimization Optimized for long runs of 0's or 1's Extra mutex locks required

Thread Creation Sorting algorithm called at least 300 times for the large

image Thread creation efficiency

Thread management structures

Page 8: Parallelization of the Telemedicine Benchmark for the Xbox 360 Architecture Howard Wong, SURF-IT Fellow Professor Jean-Luc Gaudiot, EECS August 29, 2008

PASCAL: PArallel Systems & Computer Architecture Lab.

Results (Cont’d)

0.000 0.250 0.500 0.750 1.000

2.800

2.850

2.900

2.950

3.000

3.050

Speedup versus Load (pbzip2 - 3 Threads)Compared to bzip2; 1/4, 1/2, whole image

Fraction of Image Processed

Spe

edup

0.000 0.250 0.500 0.750 1.000

0.630

0.640

0.650

0.660

0.670

0.680

0.690

Speedup versus Load (bzip2mod - 2 Threads)Compared to bzip2; 1/4, 1/2, whole image

Fraction of Image Processed

Spe

edup

Page 9: Parallelization of the Telemedicine Benchmark for the Xbox 360 Architecture Howard Wong, SURF-IT Fellow Professor Jean-Luc Gaudiot, EECS August 29, 2008

PASCAL: PArallel Systems & Computer Architecture Lab.

Conclusions & Discussion

Speedup dependent on the load size Possible improvements

Use a 'threadpool' Create other important compression functions Examine alternative algorithms with a parallel

mindset End result

Thread creation Thread management overhead Heavy contention

Page 10: Parallelization of the Telemedicine Benchmark for the Xbox 360 Architecture Howard Wong, SURF-IT Fellow Professor Jean-Luc Gaudiot, EECS August 29, 2008

PASCAL: PArallel Systems & Computer Architecture Lab.

Questions for Future Work

What is the impact of thread creation? Do the other TMB programs have the same

features? Can vector instructions improve program

performance? Are new, more efficient parallel programming

primitives needed for our application?

Page 11: Parallelization of the Telemedicine Benchmark for the Xbox 360 Architecture Howard Wong, SURF-IT Fellow Professor Jean-Luc Gaudiot, EECS August 29, 2008

PASCAL: PArallel Systems & Computer Architecture Lab.

Acknowledgments

Professor Jean-Luc Gaudiot and the PASCAL group UC Davis Neuroscience Center Professor Joerg Meyer, Center of GRAVITY, Calit2 Calit2 UROP