efficient reference counting for nested task- and data-parallelism mmnet ’13 8. may., edinburgh...
TRANSCRIPT
![Page 1: Efficient Reference Counting for Nested Task- and Data-Parallelism MMnet ’13 8. May., Edinburgh Sven-Bodo Scholz](https://reader035.vdocument.in/reader035/viewer/2022070307/551ae62c550346f70d8b48ac/html5/thumbnails/1.jpg)
Efficient Reference Counting for Nested Task- and Data-Parallelism
MMnet ’138. May., Edinburgh
Sven-Bodo Scholz
![Page 2: Efficient Reference Counting for Nested Task- and Data-Parallelism MMnet ’13 8. May., Edinburgh Sven-Bodo Scholz](https://reader035.vdocument.in/reader035/viewer/2022070307/551ae62c550346f70d8b48ac/html5/thumbnails/2.jpg)
2
Context
High ProductivityHigh Performance
High Portability
![Page 3: Efficient Reference Counting for Nested Task- and Data-Parallelism MMnet ’13 8. May., Edinburgh Sven-Bodo Scholz](https://reader035.vdocument.in/reader035/viewer/2022070307/551ae62c550346f70d8b48ac/html5/thumbnails/3.jpg)
3
Truly Implicit MM is Cool...
...A = getLargeData( inData);Z1 = incArrayBy( A, 1);Z2 = incArrayBy( A, 2);...
int[.,.] incArrayBy( int[.,.] A, int i){ return( A + i);}
Stateless Arrays are Cool...
![Page 4: Efficient Reference Counting for Nested Task- and Data-Parallelism MMnet ’13 8. May., Edinburgh Sven-Bodo Scholz](https://reader035.vdocument.in/reader035/viewer/2022070307/551ae62c550346f70d8b48ac/html5/thumbnails/4.jpg)
4
but Challenging!
...A = getLargeData( inData);Z1 = eraseDiagElem( A, 1);Z2 = eraseDiagElem( A, 2);...
int[.,.] eraseDiagElem( int[.,.] A, int i){ A[i,i] = 0; return( A);}
Aggregate Update Problem!
![Page 5: Efficient Reference Counting for Nested Task- and Data-Parallelism MMnet ’13 8. May., Edinburgh Sven-Bodo Scholz](https://reader035.vdocument.in/reader035/viewer/2022070307/551ae62c550346f70d8b48ac/html5/thumbnails/5.jpg)
5
Solutions
• version treeseg. [AasaEtAL88]
• single threadingeg. Linear Types [Wadler90]Uniqueness Types [BarendsenSmetsers95]
• non-delayed garbage collectioneg. λ-calculus [Hudak84] SISAL [Cann89] SaC [Trojahner05, GrelckScholz08]
![Page 6: Efficient Reference Counting for Nested Task- and Data-Parallelism MMnet ’13 8. May., Edinburgh Sven-Bodo Scholz](https://reader035.vdocument.in/reader035/viewer/2022070307/551ae62c550346f70d8b48ac/html5/thumbnails/6.jpg)
6
Design Space for MM
f( a, b , c) f( a, b ,c ){ ... a..... a....b.....b....c...}
conceptual copies
operation non-delayed copy delayed copy + delayed GC
delayed copy + non-delayed GC
read O(1) + free O(1) O(1) + DEC_RC_FREE
update O(1) O(n) + malloc O(1) / O(n) + malloc
reuse O(1) malloc O(1) / malloc
funcall O(1) / O(n) + malloc O(1) O(1) + INC_RC
![Page 7: Efficient Reference Counting for Nested Task- and Data-Parallelism MMnet ’13 8. May., Edinburgh Sven-Bodo Scholz](https://reader035.vdocument.in/reader035/viewer/2022070307/551ae62c550346f70d8b48ac/html5/thumbnails/7.jpg)
7
Going Multi-Core I
single-threaded
rc-op
rc-op
rc-op
rc-op
rc-op
rc-op
data-parallel
rc-op
rc-op
rc-op
rc-op
rc-op
rc-op
... ...
local variables do not escape!relatively free variables can only benefit from reuse in 1/n cases!
=> use thread-local heaps=> inhibit rc-ops on rel-free vars
![Page 8: Efficient Reference Counting for Nested Task- and Data-Parallelism MMnet ’13 8. May., Edinburgh Sven-Bodo Scholz](https://reader035.vdocument.in/reader035/viewer/2022070307/551ae62c550346f70d8b48ac/html5/thumbnails/8.jpg)
8
Going Multi-Core II
single-threaded
rc-op
rc-op
rc-op
=> use locking....
local variables do escape!relatively free variables can benefit from reuse in 1/2 cases!
rc-op
rc-op
rc-op
task-parallel
rc-op
rc-op
rc-op
![Page 9: Efficient Reference Counting for Nested Task- and Data-Parallelism MMnet ’13 8. May., Edinburgh Sven-Bodo Scholz](https://reader035.vdocument.in/reader035/viewer/2022070307/551ae62c550346f70d8b48ac/html5/thumbnails/9.jpg)
9
Going Many-Core
256 cores500 threads in HW each
functional programmersparadise, no?!
nested DP and TP parallelism
![Page 10: Efficient Reference Counting for Nested Task- and Data-Parallelism MMnet ’13 8. May., Edinburgh Sven-Bodo Scholz](https://reader035.vdocument.in/reader035/viewer/2022070307/551ae62c550346f70d8b48ac/html5/thumbnails/10.jpg)
10
RC in Many-Core Times
computational thread(s)
RC-threadrc-op
rc-op
rc-op
rc-op
![Page 11: Efficient Reference Counting for Nested Task- and Data-Parallelism MMnet ’13 8. May., Edinburgh Sven-Bodo Scholz](https://reader035.vdocument.in/reader035/viewer/2022070307/551ae62c550346f70d8b48ac/html5/thumbnails/11.jpg)
11
and here the runtimes
![Page 12: Efficient Reference Counting for Nested Task- and Data-Parallelism MMnet ’13 8. May., Edinburgh Sven-Bodo Scholz](https://reader035.vdocument.in/reader035/viewer/2022070307/551ae62c550346f70d8b48ac/html5/thumbnails/12.jpg)
12
Multi-Modal RC:
spawn
![Page 13: Efficient Reference Counting for Nested Task- and Data-Parallelism MMnet ’13 8. May., Edinburgh Sven-Bodo Scholz](https://reader035.vdocument.in/reader035/viewer/2022070307/551ae62c550346f70d8b48ac/html5/thumbnails/13.jpg)
13
new runtimes:
![Page 14: Efficient Reference Counting for Nested Task- and Data-Parallelism MMnet ’13 8. May., Edinburgh Sven-Bodo Scholz](https://reader035.vdocument.in/reader035/viewer/2022070307/551ae62c550346f70d8b48ac/html5/thumbnails/14.jpg)
14
Conclusions
• The more cores we use the more MM matters• Avoiding copying of data can lead to
bottlenecks in memory management• DP is particularly well behaved• Utilising application knowledge helps a lot• Multi-Modal-RC is well suited for highly nested
parallel applications on large non-nested data structures!
![Page 15: Efficient Reference Counting for Nested Task- and Data-Parallelism MMnet ’13 8. May., Edinburgh Sven-Bodo Scholz](https://reader035.vdocument.in/reader035/viewer/2022070307/551ae62c550346f70d8b48ac/html5/thumbnails/15.jpg)
15
Open Questions:
• Can we improve on the multi-modal version?– more modes?– more static analysis?
• How should we deal with smaller / nested structures ??
• Can we integrate those techniques???