![Page 1: An Operational Semantics for Skeletonsalpha.di.unito.it/storage/talks/2003_semantics_parco...Enable and automate performance-driven source-to-source optimizations same in/out different](https://reader033.vdocument.in/reader033/viewer/2022051906/5ff89449656730039f05d55d/html5/thumbnails/1.jpg)
An Operational Semantics for Skeletons
Marco Aldinucci
ISTI – CNRNational Research Council
Pisa, Italy
Marco Danelutto
Computer Science Dept.University of Pisa
Pisa, Italy
ParCo 2003, Dresden, Germany
![Page 2: An Operational Semantics for Skeletonsalpha.di.unito.it/storage/talks/2003_semantics_parco...Enable and automate performance-driven source-to-source optimizations same in/out different](https://reader033.vdocument.in/reader033/viewer/2022051906/5ff89449656730039f05d55d/html5/thumbnails/2.jpg)
2
Outline
SkeletonsSemantics – motivations The schema of semanticsAxioms – rules ExampleConcluding remarks
![Page 3: An Operational Semantics for Skeletonsalpha.di.unito.it/storage/talks/2003_semantics_parco...Enable and automate performance-driven source-to-source optimizations same in/out different](https://reader033.vdocument.in/reader033/viewer/2022051906/5ff89449656730039f05d55d/html5/thumbnails/3.jpg)
3
Skeletons
Skeletons are language constructswell-defined input-output behaviorparallelism exploitation patterns(sometimes) can be nestedseveral prepackaged implementations
Two main familiesData Parallel (map, reduce, scan …)Task & Stream parallel (farm, pipeline, …)
![Page 4: An Operational Semantics for Skeletonsalpha.di.unito.it/storage/talks/2003_semantics_parco...Enable and automate performance-driven source-to-source optimizations same in/out different](https://reader033.vdocument.in/reader033/viewer/2022051906/5ff89449656730039f05d55d/html5/thumbnails/4.jpg)
4
Motivations
Usually formal functional semantics, informal parallel behavior
Describe skeletonsin-out relationship (functional behavior)
parallel behavior
in uniform and precise way (non steady state)
in structural way
Theoretical work motivated by concrete needsEnable and automate performance-driven source-to-source optimizations
same in/out different parallel behaviors
Compare different skeleton sets expressive power
![Page 5: An Operational Semantics for Skeletonsalpha.di.unito.it/storage/talks/2003_semantics_parco...Enable and automate performance-driven source-to-source optimizations same in/out different](https://reader033.vdocument.in/reader033/viewer/2022051906/5ff89449656730039f05d55d/html5/thumbnails/5.jpg)
5
ABB
farm (pipe (seq f1) (seq f2)
BAB
pipe
pipe
f1 f2
f1 f2
farm
channel,network …
sche
d
gath
er
PE1 – PE4
e.g. with ASI
f1 = filterf2 = render
DI1
![Page 6: An Operational Semantics for Skeletonsalpha.di.unito.it/storage/talks/2003_semantics_parco...Enable and automate performance-driven source-to-source optimizations same in/out different](https://reader033.vdocument.in/reader033/viewer/2022051906/5ff89449656730039f05d55d/html5/thumbnails/6.jpg)
Slide 5
DI1 sequential source code just plugged indata items arrives in sequence, we cannot assume data is already distributed, data distribution cost is large, several farm scheduling policies are possible, as well as several data mappingsDipInf; 31/08/2003
![Page 7: An Operational Semantics for Skeletonsalpha.di.unito.it/storage/talks/2003_semantics_parco...Enable and automate performance-driven source-to-source optimizations same in/out different](https://reader033.vdocument.in/reader033/viewer/2022051906/5ff89449656730039f05d55d/html5/thumbnails/7.jpg)
6
pipe (map fc (seq f1 ) fd) (map gc (seq f2 ) gd)
I IIII IIIIIII I III II
mapmap
pipe
f1
f1
f1
f2
f2
f2fd gc
PE1 – PE6
gdfc
![Page 8: An Operational Semantics for Skeletonsalpha.di.unito.it/storage/talks/2003_semantics_parco...Enable and automate performance-driven source-to-source optimizations same in/out different](https://reader033.vdocument.in/reader033/viewer/2022051906/5ff89449656730039f05d55d/html5/thumbnails/8.jpg)
7
Running example language: Lithium
Stream and Data Parallelfarm, pipe
map, reduce, D&C, …
Can be freely nested
All skeletons have a stream as in/out
Java-based (skeletons are Java classes)
Implemented and running [FGCS 19(5):2003]http://www.di.unipi.it/~marcod/Lithium/ or sourceforge
Macro data-flow run-time
Support heterogeneous COWs
Includes parallel structure optimization performance-driven, source-to-source
![Page 9: An Operational Semantics for Skeletonsalpha.di.unito.it/storage/talks/2003_semantics_parco...Enable and automate performance-driven source-to-source optimizations same in/out different](https://reader033.vdocument.in/reader033/viewer/2022051906/5ff89449656730039f05d55d/html5/thumbnails/9.jpg)
8
The schema of semantics
Axioms, three kind per skeleton:1. Describe skeletons within the steady state2. Mark the begin of stream *3. Manage the end of stream *
Six rules:1. Two describing parallel execution (SP, DP)
– Have a cost2. Four to navigate in the program structure
– No cost, ensure strict execution order
Look to SP/DP rules only to figure out program performance
![Page 10: An Operational Semantics for Skeletonsalpha.di.unito.it/storage/talks/2003_semantics_parco...Enable and automate performance-driven source-to-source optimizations same in/out different](https://reader033.vdocument.in/reader033/viewer/2022051906/5ff89449656730039f05d55d/html5/thumbnails/10.jpg)
9
The meaning of labels
Label represent an enumeration of PEsTwo kind of labels:
On streams represent data mapping:
means x is available on PE3
On arrows represent computation mapping
means such computation is performed by PE4
Re-label O(l,x) a stream means communicate it
Semantics may embed an user-defined policy O(l,x)
Cost depend on label (topology) and data item x (size)
3x
4→
![Page 11: An Operational Semantics for Skeletonsalpha.di.unito.it/storage/talks/2003_semantics_parco...Enable and automate performance-driven source-to-source optimizations same in/out different](https://reader033.vdocument.in/reader033/viewer/2022051906/5ff89449656730039f05d55d/html5/thumbnails/11.jpg)
10
Axioms (steady state)
1. Apply inner skeleton F ∈ param to the stream head x
a. The arrow label gets left-hand side stream label()
b. Labels in the right-hand side may change (stream items may be bounced elsewhere)
1
2
3
2. Recur on the tail of the stream
3. Expressions 1 & 2 are joined by :: operator
![Page 12: An Operational Semantics for Skeletonsalpha.di.unito.it/storage/talks/2003_semantics_parco...Enable and automate performance-driven source-to-source optimizations same in/out different](https://reader033.vdocument.in/reader033/viewer/2022051906/5ff89449656730039f05d55d/html5/thumbnails/12.jpg)
11
a. arrow label gets stream one – happens locallyb. label doesn’t change – keep 1st stage ∆1 locally
c. re-label R inserted in between 1st & 2nd stage – it will map 2nd stage elsewhere
d. tail is expected from the same source
Lithium axioms (for stream par skeletons)
Embed seq codeStream unfolded, Labels unchanged
a. stream item is distributed accordingly O policy
b. a reference of tail of the stream follows the head
a
b dc
![Page 13: An Operational Semantics for Skeletonsalpha.di.unito.it/storage/talks/2003_semantics_parco...Enable and automate performance-driven source-to-source optimizations same in/out different](https://reader033.vdocument.in/reader033/viewer/2022051906/5ff89449656730039f05d55d/html5/thumbnails/13.jpg)
12
Lithium axioms (DP skeletons)
![Page 14: An Operational Semantics for Skeletonsalpha.di.unito.it/storage/talks/2003_semantics_parco...Enable and automate performance-driven source-to-source optimizations same in/out different](https://reader033.vdocument.in/reader033/viewer/2022051906/5ff89449656730039f05d55d/html5/thumbnails/14.jpg)
13
Lithium rules overview
![Page 15: An Operational Semantics for Skeletonsalpha.di.unito.it/storage/talks/2003_semantics_parco...Enable and automate performance-driven source-to-source optimizations same in/out different](https://reader033.vdocument.in/reader033/viewer/2022051906/5ff89449656730039f05d55d/html5/thumbnails/15.jpg)
14
sp rules details
Many semantics for each programi=j=1 always possible, i.e. no stream parallelism is exploitedAll of them are “functionally confluent”, describe the same in-out relationshipAll of them describe the same parallel behavior, but with different degrees of parallelism
![Page 16: An Operational Semantics for Skeletonsalpha.di.unito.it/storage/talks/2003_semantics_parco...Enable and automate performance-driven source-to-source optimizations same in/out different](https://reader033.vdocument.in/reader033/viewer/2022051906/5ff89449656730039f05d55d/html5/thumbnails/16.jpg)
15
• Apply the farm inner skeleton to 1st elem
• Recur on tail and change the stream label
• Assume a round-robin scheduling policy O(l,x) with 2 elements (2 pipelines)
• Iterate the same operation on the whole stream
• farm now disappeared
• two different labels on streams: 0 and 1
• Mark the begin of the stream
• Add the stream label
Example (2-ways-2-stages pipeline)⟩⟨ 765432121 ,,,,,, xxxxxxxff )) (seq ) (seq farm(pipe
• Apply pipe inner skeletons (stages) to the item
•A re-labeling operation R is introduced in the middle
• Iterate the same operation on the whole stream
• pipe now disappeared
• two different labels on streams: 0 and 1
• two different labels on R : 02, 12
![Page 17: An Operational Semantics for Skeletonsalpha.di.unito.it/storage/talks/2003_semantics_parco...Enable and automate performance-driven source-to-source optimizations same in/out different](https://reader033.vdocument.in/reader033/viewer/2022051906/5ff89449656730039f05d55d/html5/thumbnails/17.jpg)
16
Example (continued)
• This formula no longer can be reduced by axioms
• sp rule can be applied:
“Any rightmost sequence of expressions can be reduced provided streams exploits different labels”
• In this case the longest sequence includes two expressions, i.e. the max. par degree is 2 (matching the double-pipeline startup phase)
![Page 18: An Operational Semantics for Skeletonsalpha.di.unito.it/storage/talks/2003_semantics_parco...Enable and automate performance-driven source-to-source optimizations same in/out different](https://reader033.vdocument.in/reader033/viewer/2022051906/5ff89449656730039f05d55d/html5/thumbnails/18.jpg)
17
Example (continued)
Due to the re-labeling we have 4 adjacent expressions exploiting different labels: 02, 12, 0, 1 – i.e. a max. parallelism degree of 4The step can be iterated up to the end of streamMax parallelism degree 4 since no more than 4 different labels appear adjacently (easy to prove)
![Page 19: An Operational Semantics for Skeletonsalpha.di.unito.it/storage/talks/2003_semantics_parco...Enable and automate performance-driven source-to-source optimizations same in/out different](https://reader033.vdocument.in/reader033/viewer/2022051906/5ff89449656730039f05d55d/html5/thumbnails/19.jpg)
18
Example (continued)
Count parallelismCount communicationsor reason about it
By iterating SP rule we eventually get
That can be joined to form the output stream
![Page 20: An Operational Semantics for Skeletonsalpha.di.unito.it/storage/talks/2003_semantics_parco...Enable and automate performance-driven source-to-source optimizations same in/out different](https://reader033.vdocument.in/reader033/viewer/2022051906/5ff89449656730039f05d55d/html5/thumbnails/20.jpg)
19
Summary
Operational semantics for skeletonsDescribes both functional and parallel behaviorUser-defined mapping/schedulingUser-defined comm/comp costsGeneral, easy to extendNo similar results within the skeleton community
Enable performance reasoningSkeleton normal-form [PDCS99, FGCS03, web]Provably correct automatic optimizations
Formally describe your brand new skeleton and its performance
![Page 21: An Operational Semantics for Skeletonsalpha.di.unito.it/storage/talks/2003_semantics_parco...Enable and automate performance-driven source-to-source optimizations same in/out different](https://reader033.vdocument.in/reader033/viewer/2022051906/5ff89449656730039f05d55d/html5/thumbnails/21.jpg)
20
Mammography app. (lithium)
raw
optimized15 – 20% better
![Page 22: An Operational Semantics for Skeletonsalpha.di.unito.it/storage/talks/2003_semantics_parco...Enable and automate performance-driven source-to-source optimizations same in/out different](https://reader033.vdocument.in/reader033/viewer/2022051906/5ff89449656730039f05d55d/html5/thumbnails/22.jpg)
Thank youQuestions ?
www.di.unipi.it/~aldinuc
![Page 23: An Operational Semantics for Skeletonsalpha.di.unito.it/storage/talks/2003_semantics_parco...Enable and automate performance-driven source-to-source optimizations same in/out different](https://reader033.vdocument.in/reader033/viewer/2022051906/5ff89449656730039f05d55d/html5/thumbnails/23.jpg)
22
Stream skeletons
farmfunctionally the identity !a.k.a. parameter sweeping, embarrassingly parallel, replica manager …instead for some other group it is apply-to-all
pipe parallel functional compositionpipe f1 f2 < x > computes f2 ( f1 x ) f1 , f2 executed in parallel on different data items
![Page 24: An Operational Semantics for Skeletonsalpha.di.unito.it/storage/talks/2003_semantics_parco...Enable and automate performance-driven source-to-source optimizations same in/out different](https://reader033.vdocument.in/reader033/viewer/2022051906/5ff89449656730039f05d55d/html5/thumbnails/24.jpg)
23
Describe skeletons
Usually functional behavior only describedParallel behavior does matter for performance
Usually performance described by cost formulas
( ) OpOp tppnlpgt
pnOpT ⎟⎟
⎠
⎞⎜⎜⎝
⎛−+++−+⎟⎟
⎠
⎞⎜⎜⎝
⎛−= 211)( comm_size scan
Doesn’t describe the behavior just the costWhat happens if Op is parallel ?
Not compositionalhandmade for each architectureData layout not described
![Page 25: An Operational Semantics for Skeletonsalpha.di.unito.it/storage/talks/2003_semantics_parco...Enable and automate performance-driven source-to-source optimizations same in/out different](https://reader033.vdocument.in/reader033/viewer/2022051906/5ff89449656730039f05d55d/html5/thumbnails/25.jpg)
24
Axioms (begin/end of the stream)
Begin of stream marking:
End of stream management:
![Page 26: An Operational Semantics for Skeletonsalpha.di.unito.it/storage/talks/2003_semantics_parco...Enable and automate performance-driven source-to-source optimizations same in/out different](https://reader033.vdocument.in/reader033/viewer/2022051906/5ff89449656730039f05d55d/html5/thumbnails/26.jpg)
25
An example of reduction