christopher brown and kevin hammond school of computer science, university of st. andrews july 2010...

Post on 19-Jan-2018

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

The Orbit Calculation 1 Starting set f :: Int -> Int f x = (x+1) `mod` 4

TRANSCRIPT

Christopher Brown and Kevin HammondSchool of Computer Science, University of St.

AndrewsJuly 2010

Ever Decreasing Circles:

Implementing a skeleton for general Parallel Orbit calculations for use in SymGrid-Par.

A General Orbit CalculationExplore a solution space given:

An initial set of values;A set of generators;

Used in computational algebra:Symmetry of solutions: chemistry,

quantum physics, etc.Rubik’s Cube (Permutations).

Sequential implementations already exist, concerns about performance.

The Orbit Calculation

1

Starting set

f :: Int -> Int

f x = (x+1) `mod` 4

The Orbit Calculation

1

Accumulating set

f :: Int -> Int

f x = (x+1) `mod` 4

f 1

2

The Orbit Calculation

1

Accumulating set

f :: Int -> Int

f x = (x+1) `mod` 4

f 2

2

3

The Orbit Calculation

1

Accumulating set

f :: Int -> Int

f x = (x+1) `mod` 4

f 3

2

3

0

The Orbit Calculation

1

Accumulating set

f :: Int -> Int

f x = (x+1) `mod` 4

f 0

2

3

0

State-of-the-artA sequential version already exists in GAP.

But we need to be able to compute the orbit of millions of iterations.

Parallel version exists in C using hash tables:But fine-tuned to a very specific problem (direct

condensation)May not be scalable?

There is also a new parallel implementation in GAP (Shpectorov)tuple-based implementationuses SCSCP and dedicated hash-table servers

We need a general skeleton that can be used for arbitrary orbits

SymGrid-Par

SymGrid-Par

The Orbit - Sequential VersionTo our knowledge, this is the first version

ever implemented in Haskell.

orbitMul :: (Ord a, Eq a) => [ a -> a ] -> [a] -> [a] -> [a]

orbitMul gens [] set = setorbitMul gens (t:ts) set = orbitMul gens (ts++new) set' where (new, set') = applyGens gens [t] set [] applyGens = ... img = ...

queue of tasksgenerators accumulatingset of results

The Orbit - Sequential

genimg :: Eq a => (a->a) -> [a] -> [a] -> ([a], [a])

genimg g queue@(t:ts) set = if img `elem` set then ([] , set ) else (img : queue , img : set ) where img = g q

img represents the generator applied to the task (the image of the generator application).

Need to check for membership in the result set.

add img tonew task queue

add img toset of results

The Orbit - Sequential

applyGens :: Eq a => [ a -> a ] -> [a] -> [a] -> [a] -> ([a],[a])

applyGens [] q s q' = (q', s) applyGens (g:gs) queue set q' = applyGens gs queue set' (q'++queue') where (queue', set') = genimg g queue set

Recurse over list of generators.Pass result of an img into next generator

application.

Parallel OrbitNeed a queue to express tasks waiting to be

processed.We need to distribute the queue over available

PEs.We use a Task Farm (master/worker) approach

Task Farm (Master/Worker)

Extending to a Parallel OrbitThe orbit is not quite a true farm, however.

Results from workers must be accumulated and checked for duplicates… a set?

Non-duplicates are released as new tasks.Moreover, we must be sure that the orbit will

terminate!

Parallel Orbit

Parallel Orbit CalculationorbitPar :: ([ a->a ] -> [a] -> [ [a] ]) -> [a->a] -> [a] -> [a]

orbitPar orbitfun gens init = …

workerProcs = [ process (concat . (Data.List.map (orbitfun gens))) | n <- [1..noPe] ]

toWorker tasks = unshuffle noPe tasks

process abstraction

Parallel Orbit CalculationorbitPar :: ([ a->a ] -> [a] -> [ [a] ]) -> [a->a] -> [a] -> [a]

orbitPar orbitfun gens init = …

addNewTask set (t:ts) c | not (t `member` set) = t : addNewTask (Data.Set.insert t set) ts ((c-1)+nGens) | c <= 1 = [] | otherwise = addNewTask set ts (c - 1) workerProcs = …

toWorker tasks = …

count of potential tasks

Simple Test CaseTest case that gives similar (tunable)

granularities.Deliver wide range of result values.Change size of result set by setSize.All tests seeded with 1.f1 s n = (fib ((n `mod` 20) + 10) + n) `mod` setSize

f2 s n = (fib ((n `mod` 10) + 20) + n) `mod` setSize

f3 s n = (fib ((n `mod` 19) + 10) + n - 1) `mod` setSize

orbitOnList [] _ = []

orbitOnList (g:gens) list = map g list : orbitOnList gens list

Measurement FrameworkExecuted on 8-core machine running at

2.66GHz.4 GB of RAM.Compiled with GHC 6.82 -O2.Runtimes are given as an average over 10

runs.Performance of parallel version against

single core parallel version.

Farm Speedup Against Par 1

0

1

2

3

4

5

6

7

8

9

par 1 par 2 par 3 par 4 par 5 par 6 par 7 par 8

7501000200040008000160003200064000

setSize

Farm - Trace (64000)

Evaluation of the Task FarmGood for regular and well-balanced tasks.Static round-robin distribution.May suffer from load imbalance.Does not distribute tasks in a request driven

way.

A Workpool ApproachDistributes tasks in a request driven way

when a task completes, its processor is added to the queue of idle processors

Better for irregular and unbalanced tasks.Automatically deals with load imbalance.Still limited by master/worker ratio

Workpool

Workpool Speedup Against Par 1

0

1

2

3

4

5

6

7

8

9

par 1 par 2 par 3 par 4 par 5 par 6 par 7 par 8

7501000200040008000160003200064000

setSize

Workpool - Trace (64000)

ConclusionsSpeedup appears almost linear up to a factor

of 8.29 on 8 cores for a set size of 64000.Workpool is more efficient, and gives better

speed ups for larger set sizes.Workpool may incur slight overhead,

noticeable in small set sizes.Workpool is more balanced for larger set

sizes.

Work in ProgressIntegrating orbit skeleton into SymGrid-Par.

Use GAP to compute the computational algebra…… Haskell to exploit parallelism.

Application to larger problemse.g. the braid orbit

Develop tool support to aid parallel developmente.g. using refactoring

First of a series of domain-specific parallel skeletonsduplicate elimination, completion algorithm, chain

reduction, …

http://www.symbolic-computation.org

QuickTime™ and a decompressor

are needed to see this picture.

Future WorkComplete SGP integration.Solve some real symbolic computing

problems.Tool support for sequential -> parallel

transformations?Implement more parallel skeletons:

Parallel nub?

Parallel Orbit CalculationorbitPar :: ([ a->a ] -> [a] -> [ [a] ]) -> [a->a] -> [a] -> [a]orbitPar orbitfun gens init = dat where newTasks = merge (zipWith (#) workerProcs (toWorker dat))

dat = (addNewTask empty (init' ++ newTasks) (length init')) init' = take noPe (cycle init)

addNewTask set (t:ts) c | not (t `member` set) = t : addNewTask (Data.Set.insert t set) ts c' | c <= 1 = [] | otherwise = addNewTask set ts (c - 1) where c' = (c-1) + nGens

workerProcs = [ process (concat . (Data.List.map (orbitfun gens))) | n <- [1..noPe] ]

toWorker tasks = unshuffle noPe tasks

EdenSemi-explicit model of parallelism.Explicit process creation.Implicit thread creation: (unzip . streamf) :: Num a => [a] -> ([a],[a])

uncurry zip ((process (unzip . streamf) # [1..10])

where streamf args = map worker args

worker x = (factorial x, fibonacci x)

Questions?

http://www.symbolic-computation.org/The_SCIEnce_Project

WorkpoolorbitPar :: ([ a->a ] -> [a] -> [ [a] ]) -> [a->a] -> [a] -> [a]orbitPar orbitfun gens init = dat where (newReqs, newTasks) = (unzip . merge) (zipWith (#) workerProcs (toWorker dat))

dat = (addNewTask empty (init' ++ newTasks) (length init')) init' = take noPe (cycle init)

addNewTask set (t:ts) c | not (t `member` set) = t : addNewTask (Data.Set.insert t set) ts c' | c <= 1 = [] | otherwise = addNewTask set ts (c - 1) where c' = (c-1) + nGens

workerProcs = [ process (zip [n,n..] . (concat . (Data.List.map (orbitfun gens)))) | n <- [1..noPe] ]

toWorker tasks = distribute tasks requests

requests = initialReqs ++ newReqs

top related