parallelizing the simplex method with...

1
0 1 2 3 4 5 6 7 x 10 7 0 10 20 30 40 50 60 70 80 90 100 Problem Size Memory Bandwidth Usage (%) FPGA GPU L L CL CPU OpenC C 10 5 10 6 10 7 10 8 10 0 10 1 10 2 Problem Size Speed Up FPGA vs CPU GPU vs CPU CPU OpenCL vs CPU GPU vs SoPlex 10 5 10 6 10 7 10 8 10 3 10 4 10 5 Problem Size Iterations per Unit Energy (mJ -1 ) FPG A GPU CPU OpenCL CPU Application: Radiation Therapy Treatment Planning • Radiation therapy uses several beams to deliver a radiation dose to tumour cells. Recent literature has formulated the problem of optimizing the beam weights as an LP problem [3-4]. • e affected area is divided into target t and normal n voxels. e radiation from each beam to each voxel is measured to form the dose matrix D. • e model (2) minimizes the total dose delivered to non-target areas subject to constraints based on the properties of the tissue in each voxel. • Vectors x represent the upper U and lower L bounds for the target and normal areas. Beam weights w are given costs c N . • Since each beam will deliver radiation to most voxels, the dose matrix of the LP problem will be dense. min (D T n c N ) T w D t w x U t D t w x L t D n w x U n w 0 Performance Benchmarking • Language designed by Khronos Group for portability across hardware accelerators. • A Hardware accelerator, or OpenCL Device, is managed by a Host processor. • e OpenCL linear programming engine was tested on multiple devices to measure acceleration (see Table 1 for specifications). • Effort was made to optimize OpenCL kernels for each device without losing design portability. • A heterogenous OpenCL system is proposed to solve dense LP problems and tested on multiple hardware accelerators. • Test results indicate that the GPU provides the fastest solutions to LP problems as shown in Figure 3. • e selected FPGA provides superior energy efficiency for the application as shown in Figure 4. • e Simplex Algorithm solves LP problems by iterating over subsets of the problem variables to generate improving solutions. • Each simplex iteration consists of three subroutines: pricing, ratio test, and pivot. • Pivot finds the direction of the adjacent solution, ratio test calculates the magnitude of the reduction in the cost function, and pivot updates the problem data structures. • e three subroutines are performed on the OpenCL Device in a loop managed by the Host processor. De vice e Po wer (W ) Memory Band width d d (GB/s ) Intel Core i7 4930k (6 cores) 13 0 59.7 Nvidia GeF orce GTX-780 25 0 288. 4 Altera Nallatech PCIe-385N 25 24. 0 Introduction HARDWARE ACCELERATED LINEAR PROGRAMMING Parallelizing the Simplex Method with OpenCL The Simplex Algorithm • Linear Programming (LP) models are frequently encountered optimization problems in medicine, engineering and business , where a linear objective function needs to be either minimized or maximized subject to multiple linear constraints. • High performance implementations of the Simplex Algorithm facilitate solutions to large problems in these disciplines. • is work proposes an energy-efficient hardware accelerated LP solver for a class of problems in radiation therapy. e proposed accelerator is implemented and tested on various devices using OpenCL. [1] A. Hamzic, A. Huseinovic, and N. Nosovic, “Implementation and performance analysis of the simplex algorithm adapted to run on commodity opencl enabled graphics processors,” in 2011 XXIII International Symposium on Information, Communication and Automation Technologies (ICAT), 2011, pp. 1–7. [2] S. Bayliss, C.-S. Bouganis, G. Constantinides, and W. Luk, “An fpga implementation of the simplex algorithm,” in IEEE International Conference on Field Programmable Technology, 2006, pp. 49–56. [3] A. Olafsson and S. Wright, “Linear programming formulations and algorithms for radiotherapy treatment planning,” Optimization Methods and Software, vol. 21, no. 2, pp. 201–231, April 2006. [4] H. Romeijn, R. Ahuja, J. Dempsey, and A. Kumar, “A new linear programming approach to radiation therapy treatment planning problems,” Operations Research, vol. 54, no. 2, pp. 201–216, April 2006. [5] R. Wunderling, “Paralleler und objektorientierter Simplex-Algorithmus,” Ph.D. dissertation, Technische Universitat Berlin, Algorithmus,” Ph.D. dissertation, Technische Universit¨at Berlin, 1996, http://www.zib.de/Publications/abstracts/TR-96-09/. Conclusions Bradley de Vlugt, Maysam Mirahmadi, Serguei L. Primak, Abdallah Shami Figure 3: Speed Up for the OpenCL Solver Over the Sequential Solver Figure 4: Energy Efficiency of the OpenCL and Sequential Solvers OpenCL Design Analysis Table 1: OpenCL Test Device Specifications • Performance benchmarking reveals speed ups relative to sequential code that approach 2 and 10 on a CPU and GPU. • e FPGA exhibits close to unity speed up but proved to be the most efficient in terms of Simplex iterations processed per unit energy with an efficiency 5 times greater than the CPU. • e design effectively saturates the memory bandwidth of each test device as shown in Figure 5. mi n cx s.t. Ax b x 0 Linear Programming • e standard form of a linear programming problem is (1), where A represents the constraint system, b represents the constraint bounds, c is the cost function, and x is the decision variables. (1) (2) Star t Simpl ex T ransfer Problem Pricing Is Basis Optimal? Ratio T est T T Pi vo i i t Rea d Solutio n Stop Simpl ex no yes Host De vice e Figure 2: OpenCL Simplex Algorithm Design Optimal Solution Basic Feasible Solution Possible Simplex Path Feasible Region Constraint Line a) e Simplex Algorithm b) OpenCL Implementation Figure 5: Memory Bandwidth Usage Figure 1: Radiation erapy Beam Configuration [4] References Acknowledgement is work was supported in part by the Southern Ontario Smart Computing Innovation Platform Consortium (SOSCIP) and IBM Canada Ltd. e authors would like to thank Dr. Mary Fenelon, Dr. Sean Wagner, Mr. Laszlo Ladanyi and Mr. Blair Adamache from IBM for their valuable comments and suggestions.

Upload: others

Post on 02-Jan-2021

18 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallelizing the Simplex Method with OpenCLsc14.supercomputing.org/sites/all/themes/sc14/files/archive/tech... · FPGA vs CPU GPU vs CPU CPU OpenCL vs CPU GPU vs SoPlex 10 5 10 6

01

23

45

67

x 10

7

0

102030405060708090

100

Pro

ble

m S

ize

Memory Bandwidth Usage (%)

FP

GA

GP

UC

PU

Op

enC

LC

PU

Op

enC

LC

PU

Op

enC

LC

PU

Op

enC

LC

PU

Op

enC

L

105

106

107

108

100

101

102

Pro

ble

m S

ize

Speed Up

FP

GA

vs

CP

U

GP

U v

s C

PU

CP

U O

pen

CL

vs

CP

U

GP

U v

s So

Ple

x

10

51

06

10

71

08

10

3

10

4

10

5

Pro

ble

m S

ize

Iterations per Unit Energy (mJ-1

)

FP

GA

GP

U

CP

U O

pen

CL

CP

U

App

lic

at

ion

: Ra

dia

tio

n T

her

ap

y T

rea

tm

ent

Pla

nn

ing

• R

adia

tio

n t

her

apy

use

s se

vera

l bea

ms

to d

eliv

er a

rad

iati

on

do

se t

o t

um

ou

r ce

lls.

Rec

ent

lite

ratu

re h

as f

orm

ula

ted

th

e p

rob

lem

of

op

tim

izin

g th

e b

eam

wei

ghts

as

an

L

P p

rob

lem

[3-

4].

• �

e a�

ecte

d a

rea

is d

ivid

ed i

nto

tar

get t

and

no

rmal

n v

oxe

ls.

�e

rad

iati

on

fro

m e

ach

bea

m t

o e

ach

vo

xel i

s m

easu

red

to

fo

rm t

he

do

se m

atri

x D

.

• �

e m

od

el (

2) m

inim

izes

th

e to

tal d

ose

del

iver

ed t

o n

on

-tar

get

area

s su

bje

ct t

o

co

nst

rain

ts b

ased

on

th

e p

rop

erti

es o

f th

e ti

ssu

e in

eac

h v

oxe

l.

• V

ecto

rs x

rep

rese

nt

the

up

per

U a

nd

low

er L

b

ou

nd

s fo

r th

e ta

rget

an

d n

orm

al a

reas

. Bea

m

wei

ghts

w a

re g

iven

co

sts cN.

• Si

nce

eac

h b

eam

wil

l del

iver

rad

iati

on

to

mo

st

voxe

ls, t

he

do

se m

atri

x o

f th

e L

P p

rob

lem

w

ill b

e d

ense

.

min

(DT ncN)T

w

Dtw

≤xU t

Dtw

≥xL t

Dnw

≤xU n

w≥

0

solu

tio

ns

(BF

S),

of

the

convex

po

lyh

edro

nre

pre

sen

tin

gth

e

Per

for

ma

nc

e B

enc

hm

ar

kin

g

• L

angu

age

des

ign

ed b

y K

hro

no

s G

rou

p f

or

po

rtab

ilit

y ac

ross

har

dw

are

acce

lera

tors

.

• A

Har

dw

are

acce

lera

tor,

or

Op

enC

L D

evic

e, i

s m

anag

ed b

y a

Ho

st p

roce

sso

r.

• �

e O

pen

CL

lin

ear

pro

gram

min

g en

gin

e w

as t

este

d o

n m

ult

iple

dev

ices

to

mea

sure

acce

lera

tio

n (

see

Tab

le 1

fo

r sp

eci#

cati

on

s).

• E

�o

rt w

as m

ade

to o

pti

miz

e O

pen

CL

ker

nel

s fo

r ea

ch d

evic

e w

ith

ou

t lo

sin

g d

esig

n

p

ort

abil

ity.

• A

het

ero

gen

ou

s O

pen

CL

sys

tem

is

pro

po

sed

to

so

lve

den

se L

P p

rob

lem

s an

d

te

sted

on

mu

ltip

le h

ard

war

e ac

cele

rato

rs.

• T

est

resu

lts

ind

icat

e th

at t

he

GP

U p

rovi

des

th

e fa

stes

t so

luti

on

s to

LP

pro

ble

ms

as

sh

ow

n i

n F

igu

re 3

.

• �

e se

lect

ed F

PG

A p

rovi

des

su

per

ior

ener

gy

e$ci

ency

fo

r th

e ap

pli

cati

on

as

sh

ow

n i

n F

igu

re 4

.

• �

e Si

mp

lex

Alg

ori

thm

so

lves

LP

pro

ble

ms

by

iter

atin

g o

ver

sub

sets

of

the

pro

ble

m

va

riab

les

to g

ener

ate

imp

rovi

ng

solu

tio

ns.

• E

ach

sim

ple

x it

erat

ion

co

nsi

sts

of

thre

e su

bro

uti

nes

: pri

cin

g, r

atio

tes

t, a

nd

piv

ot.

• P

ivo

t #

nd

s th

e d

irec

tio

n o

f th

e ad

jace

nt

solu

tio

n, r

atio

tes

t ca

lcu

late

s th

e m

agn

itu

de

of

th

e re

du

ctio

n i

n t

he

cost

fu

nct

ion

, an

d p

ivo

t u

pd

ates

th

e p

rob

lem

dat

a st

ruct

ure

s.

• �

e th

ree

sub

rou

tin

es a

re p

erfo

rmed

on

th

e O

pen

CL

Dev

ice

in a

loo

p m

anag

ed b

y th

e

Ho

st p

roce

sso

r.

Device

De

De

Power

(W)

Mem

ory

Ban

dwidth

Ban

dBan

d(G

B/s)

Inte

lC

ore

i749

30k

(6co

res)

130

59.7

Nvi

dia

GeF

orc

eG

TX

-780

250

288.

4

Alt

era

Nal

late

chP

CIe

-385

N25

24.

0

Int

ro

du

ct

ion

HA

RD

WA

RE A

CC

EL

ER

AT

ED

LIN

EA

R P

RO

GR

AM

MIN

GP

aral

leli

zin

g th

e Si

mp

lex

Met

ho

d w

ith

Op

enC

L

Th

e Si

mpl

ex A

lgo

rit

hm

• L

inea

r P

rogr

amm

ing

(LP

) m

od

els

are

freq

uen

tly

enco

un

tere

d o

pti

miz

atio

n

pro

ble

ms

in m

edic

ine,

en

gin

eeri

ng

and

bu

sin

ess

, wh

ere

a li

nea

r o

bje

ctiv

e fu

nct

ion

nee

ds

to b

e ei

ther

min

imiz

ed o

r m

axim

ized

su

bje

ct t

o m

ult

iple

lin

ear

con

stra

ints

.

• H

igh

per

form

ance

im

ple

men

tati

on

s o

f th

e Si

mp

lex

Alg

ori

thm

fac

ilit

ate

solu

tio

ns

to

larg

e p

rob

lem

s in

th

ese

dis

cip

lin

es.

• �

is w

ork

pro

po

ses

an e

ner

gy-

e$ci

ent

har

dw

are

acce

lera

ted

LP

so

lver

fo

r a

clas

s

of

pro

ble

ms

in r

adia

tio

n t

her

apy.

�e

pro

po

sed

acc

eler

ato

r is

im

ple

men

ted

an

d

te

sted

on

var

iou

s d

evic

es u

sin

g O

pen

CL

.

[1]

A. H

amzi

c, A

. Hu

sein

ovi

c, a

nd

N. N

oso

vic,

“Im

ple

men

tati

on

an

d p

erfo

rman

ce a

nal

ysis

of

the

sim

ple

x al

gori

thm

ad

apte

d t

o r

un

on

co

mm

od

ity

op

encl

en

able

d g

rap

hic

s p

roce

sso

rs,”

in

201

1 X

XII

I In

tern

atio

nal

Sym

pos

ium

on

In

form

atio

n, C

omm

un

icat

ion

an

d

A

uto

mat

ion

Tec

hn

olog

ies

(IC

AT

), 2

011,

pp

. 1–

7.

[2]

S. B

ayli

ss, C

.-S.

Bo

uga

nis

, G. C

on

stan

tin

ides

, an

d W

. Lu

k, “

An

fp

ga i

mp

lem

enta

tio

n o

f th

e si

mp

lex

algo

rith

m,”

in

IE

EE

In

tern

atio

nal

C

onfe

ren

ce o

n F

ield

Pro

gram

mab

le T

ech

nol

ogy,

200

6, p

p. 4

9–56

.

[3]

A. O

lafs

son

an

d S

. Wri

ght,

“L

inea

r p

rogr

amm

ing

form

ula

tio

ns

and

alg

ori

thm

s fo

r ra

dio

ther

apy

trea

tmen

t p

lan

nin

g,”

Op

tim

izat

ion

M

eth

ods

and

Sof

twar

e, v

ol.

21, n

o. 2

, pp

. 201

–23

1, A

pri

l 200

6.

[4]

H

. Ro

mei

jn, R

. Ah

uja

, J. D

emp

sey,

an

d A

. Ku

mar

, “A

new

lin

ear

pro

gram

min

g ap

pro

ach

to

rad

iati

on

th

erap

y tr

eatm

ent

pla

nn

ing

p

rob

lem

s,” O

per

atio

ns

Res

earc

h, v

ol.

54, n

o. 2

, pp

. 201

–21

6, A

pri

l 200

6.

[5]

R. W

un

der

lin

g, “

Par

alle

ler

un

d o

bje

kto

rien

tier

ter

Sim

ple

x-A

lgo

rith

mu

s,” P

h.D

. dis

sert

atio

n, T

ech

nis

che

Un

iver

sita

t B

erli

n,

A

lgo

rith

mu

s,” P

h.D

. dis

sert

atio

n, T

ech

nis

che

Un

iver

sit¨

at B

erli

n, 1

996,

htt

p:/

/ww

w.z

ib.d

e/P

ub

lica

tio

ns/

abst

ract

s/T

R-9

6-09

/.

Co

nc

lusi

on

s

Bra

dle

y d

e V

lugt

, May

sam

Mir

ahm

adi,

Ser

guei

L. P

rim

ak, A

bd

alla

h S

ham

i

Fig

ure

3: S

pee

d U

p f

or

the

Op

enC

L S

olv

er O

ver

the

Seq

uen

tial

So

lver

Fig

ure

4: E

ner

gy

E$

cien

cy o

f th

e O

pen

CL

and

Seq

uen

tial

So

lver

s

Ope

nC

LD

esig

n A

na

lysi

s

Tab

le 1

: Op

enC

L T

est

Dev

ice

Spec

i#ca

tio

ns

• P

erfo

rman

ce b

ench

mar

kin

g re

veal

s sp

eed

up

s re

lati

ve t

o s

equ

enti

al c

od

e th

at

app

roac

h 2

an

d 1

0 o

n a

CP

U a

nd

GP

U.

• �

e F

PG

A e

xhib

its

clo

se t

o u

nit

y sp

eed

up

bu

t p

rove

d t

o b

e th

e m

ost

e$

cien

t in

term

s o

f Si

mp

lex

iter

atio

ns

pro

cess

ed p

er u

nit

en

erg

y w

ith

an

e$

cien

cy 5

tim

es

gr

eate

r th

an t

he

CP

U.

• �

e d

esig

n e

�ec

tive

ly s

atu

rate

s th

e m

emo

ry b

and

wid

th o

f ea

ch t

est

dev

ice

as s

ho

wn

in F

igu

re 5

.

min

cx

s.t.

Ax≤

b

x≥

0

Lin

ear

Pr

og

ra

mm

ing

• �

e st

and

ard

fo

rm o

f a

lin

ear

pro

gram

min

g p

rob

lem

is

(1),

wh

ere A

rep

rese

nts

th

e

con

stra

int

syst

em, b

rep

rese

nts

th

e co

nst

rain

t b

ou

nd

s, c

is

the

cost

fu

nct

ion

, an

d x

is t

he

dec

isio

n v

aria

ble

s.

(1)

(2)

Star

tSi

mp

lex

Tra

nsf

erP

rob

lem

Pri

cin

g

IsB

asis

Op

tim

al?

Rat

ioT

est

TT

Piv

oP

iP

it

Rea

dSo

luti

on

Sto

pSi

mp

lex

no

yes

Ho

st

Dev

ice

De

De

Fig

ure

2: O

pen

CL

Sim

ple

x A

lgo

rith

m D

esig

n

Op

tim

al S

olu

tio

n

Bas

ic F

easi

ble

So

luti

on

Po

ssib

le S

imp

lex

Pat

h

Fea

sib

le R

egio

n

Co

nst

rain

t L

ine

a) �

e Si

mp

lex

Alg

ori

thm

b)

Op

enC

L I

mp

lem

enta

tio

n

Fig

ure

5: M

emo

ry B

and

wid

th U

sage

Fig

ure

1: R

adia

tio

n �

erap

yB

eam

Co

n#

gura

tio

n [

4]

Ref

eren

ces

Ac

kn

ow

led

gem

ent

�is

wo

rk w

as s

up

po

rted

in

par

t b

y th

e S

ou

ther

n O

nta

rio

Sm

art

Co

mp

uti

ng

Inn

ova

tio

n P

latf

orm

Co

nso

rtiu

m (

SOSC

IP)

and

IB

M C

anad

a L

td.

�e

auth

ors

wo

uld

lik

e to

th

ank

Dr.

Mar

y F

enel

on

, Dr.

Sea

n W

agn

er, M

r.

Las

zlo

Lad

anyi

an

d M

r. B

lair

Ad

amac

he

fro

m I

BM

fo

r th

eir

valu

able

co

mm

ents

an

d s

ugg

esti

on

s.