evaluation of offset assignment heuristics
DESCRIPTION
Evaluation of Offset Assignment Heuristics. Johnny Huynh, Jose Nelson Amaral, Paul Berube University of Alberta, Canada Sid-Ahmed-Ali Touati Universite de Versailles, France. Outline. Background Traditional Approach to Offset Assignment Simple Offset Assignment - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/1.jpg)
Evaluation of Offset Assignment Heuristics
Johnny Huynh, Jose Nelson Amaral, Paul BerubeUniversity of Alberta, Canada
Sid-Ahmed-Ali TouatiUniversite de Versailles, France
![Page 2: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/2.jpg)
Outline
• Background• Traditional Approach to Offset Assignment
• Simple Offset Assignment• Address-Register Assignment
• Improving the Problem Model• Optimal Address-Code Generation• Memory Layout Permutations
• Evaluating Current Heuristics• Methodology• Results
• Conclusions and Future Work
![Page 3: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/3.jpg)
Outline
• Background• Traditional Approach to Offset Assignment
• Simple Offset Assignment• Address-Register Assignment
• Improving the Problem Model• Optimal Address-Code Generation• Memory Layout Permutations
• Evaluating Current Heuristics• Methodology• Results
• Conclusions and Future Work
![Page 4: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/4.jpg)
Background
• Digital Signal Processors (DSPs) have few general purpose registers
• Program variables kept in memory• Address Registers (AR) used to access
variables• After a variable is accessed, the AR can be
auto-incremented (or decremented) by one word in the same cycle.
![Page 5: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/5.jpg)
Processor Model
• Texas Instruments TMS320C54X DSP family:• Accumulator-based DSP• 8 Address Registers• Initializing an address register requires 2 cycles of
overhead• Explicit address computations require 1 cycle of
overhead• Using auto-increment (or auto-decrement) has no
overhead.
![Page 6: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/6.jpg)
Processor ModelExample: add ‘A’ and ‘B’, store in accumulator
$AR0 = &A
$ACC = *$AR0
$AR0 = $AR0 + 2
$ACC += *$AR0
$AR0 = &A$ACC = *$AR0++$ACC += *$AR0
Explicit address computationAuto-Increment
A C B A B C0x1000 0x1001 0x1002 0x1000 0x1001 0x1002
![Page 7: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/7.jpg)
Processor ModelExample: add ‘A’ and ‘B’, store in accumulator
$AR0 = &A
$ACC = *$AR0
$AR0 = $AR0 + 2
$ACC += *$AR0
$AR0 = &A$ACC = *$AR0++$ACC += *$AR0
Explicit address computationAuto-Increment
A C B A B C0x1000 0x1001 0x1002 0x1000 0x1001 0x1002
![Page 8: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/8.jpg)
The Offset-Assignment Problem
• Given k address registers and a basic block accessing n variables, find a memory layout that minimizes address-computation overhead.
• How should the variables be placed in memory?• Which register should access each variable?
![Page 9: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/9.jpg)
Outline
• Background• Traditional Approach to Offset Assignment
• Simple Offset Assignment• Address-Register Assignment
• Improving the Problem Model• Optimal Address-Code Generation• Memory Layout Permutations
• Evaluating Current Heuristics• Methodology• Results
• Conclusions and Future Work
![Page 10: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/10.jpg)
Traditional Approach to Offset AssignmentAccess
Sequence
Address RegisterAssignment
Sub-SequenceSub-Sequence Sub-Sequence
Sub-Layout
Simple OffsetAssignment
Sub-Layout
Simple OffsetAssignment
Sub-Layout
Simple OffsetAssignment
Basic BlockGenerate
Access Sequence
Address-ComputationOverhead
Address-CodeGeneration
![Page 11: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/11.jpg)
Traditional Approach:Simple Offset Assignment (SOA)• In 1992, Bartley introduced the simplest form of the offset
assignment problem:
Given a single address register and basic block with n variables, find a memory layout that minimizes overhead.
• Equivalent to finding a maximum weight path cover (NP-complete)• Many researchers have proposed heuristics for this problem:
• Liao et. al. (1996)• Leupers and Marwedel (1996)• Sugino et. al. (1996)
![Page 12: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/12.jpg)
Simple Offset Assignment (SOA)•Fix the access sequence
•Assume only one address register (k = 1)
•Find an ordering of variables in memory (memory layout) that has minimum overhead.
AB
D
FC
E
22
2
2
Ex.
Access Sequence: ‘a d b e c f b e c f a d’
Memory Layout:
![Page 13: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/13.jpg)
Simple Offset Assignment (SOA)• Create Access Graph G = (V, E)
• V = variables
• weight of edge is the frequency of consecutive accesses
• A path defines a memory layout -- Find the Maximum Weight Path Cover
• NP-Complete!
AB
D
FC
E
22
2
2
Ex.
Access Sequence: ‘a d b e c f b e c f a d’
Memory Layout:
![Page 14: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/14.jpg)
Simple Offset Assignment (SOA)• Create Access Graph G = (V, E)
• V = variables
• weight of edge is the frequency of consecutive accesses
• A path defines a memory layout -- Find the Maximum Weight Path Cover
• NP-Complete!
AB
D
FC
E
22
2
2
Ex.
Access Sequence: ‘a d b e c f b e c f a d’
Memory Layout: d a f c e b
![Page 15: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/15.jpg)
Traditional Approach:General Offset Assignment (GOA)• Problem presented by Liao et. al. in 1996.• Given k address registers, and a basic block with n variables, find
an assignment of variables to address registers that minimizes the total overhead of all registers.
• This problem formulation is more accurately described as Address-Register Assignment (ARA).
• Consists of SOA problems, and is at least NP-hard.• Many researchers have proposed heuristics for address-register
assignment:• Leupers and Marwedel (1996)• Sugino et. al. (1996)• Zhuang et. al. (2003)
![Page 16: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/16.jpg)
General Offset Assignment (GOA)
• Fix the access sequence• Allow multiple address registers (k>1)• Find an ordering of variables in memory
(memory layout) that has minimum overhead.
• Assign each variable to an address register to form access sub-sequences.
AB
D
FC
E
22
2
2
Ex.
Access Sequence: ‘a d b e c f b e c f a d’
Sub-sequence1: ‘a b c b c a’
Sub-sequence2: ‘d e f e f d’
![Page 17: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/17.jpg)
General Offset Assignment (GOA)
AB
D
FC
E2
2
Ex.
Access Sequence: ‘a d b e c f b e c f a d’
Sub-sequence1: ‘a b c b c a’
Sub-sequence2: ‘d e f e f d’
• Each sub-sequence can be viewed as an independent SOA problem.
• Solve each sub-sequence as independent SOA problems.
• More appropriate to call this problem the Address Register Assignment (ARA) problem.
• Requires solving SOA instances, so is at least NP-hard.
![Page 18: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/18.jpg)
General Offset Assignment (GOA)
AB
D
FC
E2
2
Ex.
Access Sequence: ‘a d b e c f b e c f a d’
Memory Layouts: a b c d e f
• Each sub-sequence can be viewed as an independent SOA problem.
• Solve each sub-sequence as independent SOA problems.
• More appropriate to call this problem the Address Register Assignment (ARA) problem.
• Requires solving SOA instances, so is at least NP-hard.
![Page 19: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/19.jpg)
Address-Code Generation
• Recall that variables are assigned to address registers.
• There is nothing left to decide – each address register has a defined sequence of accesses.
• Imposes a restriction that all access to a variable is done by a single address register.
AB
D
FC
E2
2
Ex.
Access Sequence: ‘a d b e c f b e c f a d’
Memory Layouts: a b c d e f
AR0 AR1
![Page 20: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/20.jpg)
Address-Code Generation
• Recall that variables are assigned to address registers.
• There is nothing left to decide – each address register has a defined sequence of accesses.
• Imposes a restriction that all access to a variable is done by a single address register.
AB
D
FC
E2
2
Ex.
Access Sequence: ‘a d b e c f b e c f a d’
Memory Layouts: a b c d e f
AR0 AR1
![Page 21: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/21.jpg)
Address-Code Generation
• Recall that variables are assigned to address registers.
• There is nothing left to decide – each address register has a defined sequence of accesses.
• Imposes a restriction that all access to a variable is done by a single address register.
AB
D
FC
E2
2
Ex.
Access Sequence: ‘a d b e c f b e c f a d’
Memory Layouts: a b c d e f
AR0 AR1
![Page 22: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/22.jpg)
Address-Code Generation
• Recall that variables are assigned to address registers.
• There is nothing left to decide – each address register has a defined sequence of accesses.
• Imposes a restriction that all access to a variable is done by a single address register.
AB
D
FC
E2
2
Ex.
Access Sequence: ‘a d b e c f b e c f a d’
Memory Layouts: a b c d e f
AR0 AR1
![Page 23: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/23.jpg)
Address-Code Generation
• Recall that variables are assigned to address registers.
• There is nothing left to decide – each address register has a defined sequence of accesses.
• Imposes a restriction that all access to a variable is done by a single address register.
AB
D
FC
E2
2
Ex.
Access Sequence: ‘a d b e c f b e c f a d’
Memory Layouts: a b c d e f
AR0 AR1
![Page 24: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/24.jpg)
Address-Code Generation
• Recall that variables are assigned to address registers.
• There is nothing left to decide – each address register has a defined sequence of accesses.
• Imposes a restriction that all access to a variable is done by a single address register.
AB
D
FC
E2
2
Ex.
Access Sequence: ‘a d b e c f b e c f a d’
Memory Layouts: a b c d e f
AR0 AR1
![Page 25: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/25.jpg)
Address-Code Generation
• Recall that variables are assigned to address registers.
• There is nothing left to decide – each address register has a defined sequence of accesses.
• Imposes a restriction that all access to a variable is done by a single address register.
AB
D
FC
E2
2
Ex.
Access Sequence: ‘a d b e c f b e c f a d’
Memory Layouts: a b c d e f
AR0 AR1
![Page 26: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/26.jpg)
Address-Code Generation
• Recall that variables are assigned to address registers.
• There is nothing left to decide – each address register has a defined sequence of accesses.
• Imposes a restriction that all access to a variable is done by a single address register.
AB
D
FC
E2
2
Ex.
Access Sequence: ‘a d b e c f b e c f a d’
Memory Layouts: a b c d e f
AR0 AR1
*Requires Explicit Address Computations
![Page 27: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/27.jpg)
‘a d b e c f b e c f a d’
‘a b c b c a’ ‘d e f e f d’
[a, b, c] [d, e, f]
Simple OffsetAssignment
Simple OffsetAssignment
Address Register Assignment
Sub-sequence and memory layout accessed by AR0
Sub-sequence and memory layout accessed by AR1
Traditional Approach to Offset Assignment
![Page 28: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/28.jpg)
Outline
• Background• Traditional Approach to Offset Assignment
• Simple Offset Assignment• Address-Register Assignment
• Improving the Problem Model• Optimal Address-Code Generation• Memory Layout Permutations
• Evaluating Current Heuristics• Methodology• Results
• Conclusions and Future Work
![Page 29: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/29.jpg)
Optimal Address-Code Generation
• Given a fixed access sequence and memory layout, it is possible to generate optimal addressing-code in polynomial time:
• Minimum-Cost Circulation (Gebotys, 1997)
• Minimum-Weight Perfect Matching (Udayanarayanan, 2000)
![Page 30: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/30.jpg)
Optimal Address-Code Generation•Build a network-flow graph
•Vertices represent variable accesses
•For each access ai that occurs before another aj, there is an edge (ai,aj) (not all shown the graph).
•Edges represent an opportunity for a register to access variables.
•Each unit flow represents the accesses performed by an address register.
•Optimal Address-Code is found by finding a minimum-cost circulation.
Acc
ess
Se
qu
en
ce
Memory Layout
FEDACBAR2
AR1
D
A
F
C
E
B
F
C
E
B
D
A
a3
a5
a7
a9
a11
a12
a1
a2
a4
a6
a8
a10
S
T Capacity = number of ARs
Cost = initialization overhead
Outbound edges from S
Cost = 0
Inbound edges to T
Cost = 0
Edge costs
Dependent on distance
Between variables accessed
All vertices require
one unit of flow
![Page 31: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/31.jpg)
Traditional Approach to Offset AssignmentAccess
Sequence
Address RegisterAssignment
Sub-Sequence
Sub-Layout
Simple OffsetAssignment
Address-ComputationOverhead
Address-CodeGeneration
Sub-Sequence
Sub-Layout
Simple OffsetAssignment
Sub-Sequence
Sub-Layout
Simple OffsetAssignment
NP-Hard
NP-Complete
Solved, but not used!
![Page 32: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/32.jpg)
Memory Layout Permutations (MLP)• Since optimal address-code generation
algorithms exist, they can be applied after a memory layout is formed (by traditional approaches).
• However, the traditional approach generates multiple sub-layouts that were originally assumed to be independent.
• How is a single memory layout formed from a set of sub-layouts?
![Page 33: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/33.jpg)
Memory Layout Permutations
• Let Mi be a memory sub-layout.
• Let Mir be the reciprocal of Mi
• Given an access sequence and m memory sub-layouts, arrange {(M1|M1
r),…,(Mm|Mmr)}, such
that overhead is minimum when the sub-layouts are placed contiguously in memory.
spermuation unique 2
)2)(!( are therelayouts,-sub ''given
mmm
![Page 34: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/34.jpg)
Memory Layout Permutations
Example:
‘a d b e c f b e c f a d’
‘a b c b c a’ ‘d e f e f d’
{a, b, c} {d, e, f}
[a, b, c, d, e, f], [f, e, d, c, b, a][c, b, a, d, e, f], [f, e, d, a, b, c][a, b, c, f, e, d], [d, e, f, c, b, a][c, b, a, f, e, d], [d, e, f, a, b, c]
Simple OffsetAssignment
Simple OffsetAssignment
Address Register Assignment
Memory Layout Permutations
This is an optimal address register assignment
These are optimal simple offset assignments
All possible Memory Layout Permutations (all have cost > 4)
Optimal Layout: {b, c, a, d, e, f} with cost = 4 is not found
![Page 35: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/35.jpg)
Outline
• Background• Traditional Approach to Offset Assignment
• Simple Offset Assignment• Address-Register Assignment
• Improving the Problem Model• Optimal Address-Code Generation• Memory Layout Permutations
• Evaluating Current Heuristics• Methodology• Results
• Conclusions and Future Work
![Page 36: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/36.jpg)
Experimental MethodologyEvaluating the Solution Space
• Testcases are DSP code kernels from the UTDSP benchmark suite.
• Use gcc to obtain access sequences.• The quality of a memory layout is evaluated
using the minimum-cost circulation technique.• The entire solution space is found for each
access sequence, to be used as a point of reference.
Basic Block
Compile with gcc
AccessSequence
Distribution of Overheads
1
10
100
1000
10000
100000
1000000
5 6 7 8 9 10 11 12 13
Overhead (Cycles)
Frequency (Layouts)
Compute Overhead of All Layouts using Minimum-Cost FlowKernel Accesses Variables Possible #
of layouts
iir_arr 21 8 20,160
iir_arr_swp 33 12 239,500,800
latnrm_arr_swp 30 10 1,824,400
latnrm_ptr 30 10 1,824,400
latnrm_ptr_swp 30 10 1,824,400
![Page 37: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/37.jpg)
Experimental MethodologyEvaluating Current Heuristics
• Identified and implemented three Address-Register Assignment heuristic algorithms:
• Leupers• Sugino• Zhuang
Leupers Sugino Zhuang
Liao Leupers ALOMA OFU B&B
Access Sequence
Sub-Sequences
Sub-Layouts
Memory Layout Permutations
Memory Layouts
Compute Overheadfor each layout
via Minimum-Cost Circulation
Distribution ofOverhead values
![Page 38: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/38.jpg)
Experimental MethodologyEvaluating Current Heuristics
• Identified and implemented five Simple Offset Assignment heuristic algorithms:
• Liao• Leupers• ALOMA• Order-First Use (OFU)• Branch and Bound (B&B)
Leupers Sugino Zhuang
Liao Leupers ALOMA OFU B&B
Access Sequence
Sub-Sequences
Sub-Layouts
Memory Layout Permutations
Memory Layouts
Compute Overheadfor each layout
via Minimum-Cost Circulation
Distribution ofOverhead values
![Page 39: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/39.jpg)
Experimental MethodologyEvaluating Current Heuristics
• Each combination of ARA and SOA algorithm generates a set of sub-layouts.
• All possible memory layout permutations are generated, forming a set of memory layouts.
• Each memory layout is evaluated using the Minimum-Cost Circulation technique.
Leupers Sugino Zhuang
Liao Leupers ALOMA OFU B&B
Access Sequence
Sub-Sequences
Sub-Layouts
Memory Layout Permutations
Memory Layouts
Compute Overheadfor each layout
via Minimum-Cost Circulation
Distribution ofOverhead values
![Page 40: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/40.jpg)
Results
• The 15 combinations of algorithms produce 15 distributions overhead values.
• The distributions are aggregated into one distribution.
• The aggregate distributions represent the solution space of all current algorithms.
![Page 41: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/41.jpg)
Results
• Memory layouts have a significant impact on overhead.
• Some layouts have 100% higher overhead than the minimum.
• Over 99% of all layouts have an overhead that is 50% higher than the minimum.
![Page 42: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/42.jpg)
Results
• Memory layouts produced by traditional approaches have a large range of possible overhead values -- sometimes the same as the entire solution space itself.
• In some cases, no combination of ARA and SOA heuristics can produce an optimal layout.
![Page 43: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/43.jpg)
Results
• Memory layouts produced by traditional approaches have a large range of possible overhead values -- sometimes the same as the entire solution space itself.
• In some cases, no combination of ARA and SOA heuristics can produce an optimal layout.
![Page 44: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/44.jpg)
Distribution of Overhead ValuesTestcase: iir_arr_swp -- infinite impulse response filter
Overhead (cycles) Exhaustive Algorithmic
6 144 0
7 19557 72
8 1514917 2240
9 21757157 6516
10 90478895 10496
11 104101226 2565
12 21628904 0
Average Overhead 10.51 9.6
![Page 45: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/45.jpg)
1
10
100
1000
10000
100000
1000000
10000000
100000000
1000000000
6 7 8 9 10 11 12
Overhead (cycles)
Frequency
Exhaustive Solution SpaceTestcase: iir_arr_swp -- infinite impulse response filter
![Page 46: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/46.jpg)
Algorithmic Solution SpaceTestcase: iir_arr_swp -- infinite impulse response filter
0
2000
4000
6000
8000
10000
12000
6 7 8 9 10 11 12
Overhead (cycles)
Frequency
![Page 47: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/47.jpg)
Efficiency of SOA Algorithms
• For each SOA algorithm, combine with each of the 5 ARA algorithms to generate 5 distributions of overhead values.
• The distributions can be aggregated to form a single distribution.
Leupers Sugino Zhuang
Liao Leupers ALOMA OFU B&B
Access Sequence
Sub-Sequences
Sub-Layouts
Memory Layout Permutations
Memory Layouts
Compute Overheadfor each layout
via Minimum-Cost Circulation
Distribution ofOverhead values
![Page 48: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/48.jpg)
Efficiency of SOA Algorithms
• For each SOA algorithm, combine with each of the 5 ARA algorithms to generate 5 distributions of overhead values.
• The distributions can be aggregated to form a single distribution.
Leupers Sugino Zhuang
Liao Leupers ALOMA OFU B&B
Access Sequence
Sub-Sequences
Sub-Layouts
Memory Layout Permutations
Memory Layouts
Compute Overheadfor each layout
via Minimum-Cost Circulation
Distribution ofOverhead values
![Page 49: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/49.jpg)
Efficiency of SOA Algorithms
• For each SOA algorithm, combine with each of the 5 ARA algorithms to generate 5 distributions of overhead values.
• The distributions can be aggregated to form a single distribution.
Leupers Sugino Zhuang
Liao Leupers ALOMA OFU B&B
Access Sequence
Sub-Sequences
Sub-Layouts
Memory Layout Permutations
Memory Layouts
Compute Overheadfor each layout
via Minimum-Cost Circulation
Distribution ofOverhead values
![Page 50: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/50.jpg)
Efficiency of SOA Algorithms
• For each SOA algorithm, combine with each of the 5 ARA algorithms to generate 5 distributions of overhead values.
• The distributions can be aggregated to form a single distribution.
Leupers Sugino Zhuang
Liao Leupers ALOMA OFU B&B
Access Sequence
Sub-Sequences
Sub-Layouts
Memory Layout Permutations
Memory Layouts
Compute Overheadfor each layout
via Minimum-Cost Circulation
Distribution ofOverhead values
![Page 51: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/51.jpg)
Efficiency of SOA Algorithms
• For each SOA algorithm, combine with each of the 5 ARA algorithms to generate 5 distributions of overhead values.
• The distributions can be aggregated to form a single distribution.
Leupers Sugino Zhuang
Liao Leupers ALOMA OFU B&B
Access Sequence
Sub-Sequences
Sub-Layouts
Memory Layout Permutations
Memory Layouts
Compute Overheadfor each layout
via Minimum-Cost Circulation
Distribution ofOverhead values
![Page 52: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/52.jpg)
Overhead (cycles) Liao Leupers Sugino B&B OFU
6 0 0 0 0 0
7 6 6 10 6 44
8 293 293 357 293 1004
9 960 960 1187 960 2448
10 2154 2154 2124 2154 1910
11 619 619 354 619 354
12 0 0 0 0 0
Efficiency of SOA AlgorithmsTestcase: iir_arr_swp -- infinite impulse response filter
![Page 53: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/53.jpg)
Efficiency of SOA AlgorithmsTestcase: iir_arr_swp -- infinite impulse response filter
0
500
1000
1500
2000
2500
3000
6 7 8 9 10 11
Overhead (cycles)
Fre
qu
en
cy
Liao
Leupers
Sugino
BNB
OFU
![Page 54: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/54.jpg)
Efficiency of ARA Algorithms
• For each ARA algorithm, combine with each of the 3 SOA algorithms to generate 3 distributions of overhead values.
• The distributions can be aggregated to form a single distribution.
Leupers Sugino Zhuang
Liao Leupers ALOMA OFU B&B
Access Sequence
Sub-Sequences
Sub-Layouts
Memory Layout Permutations
Memory Layouts
Compute Overheadfor each layout
via Minimum-Cost Circulation
Distribution ofOverhead values
![Page 55: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/55.jpg)
Efficiency of ARA Algorithms
• For each ARA algorithm, combine with each of the 3 SOA algorithms to generate 3 distributions of overhead values.
• The distributions can be aggregated to form a single distribution.
Leupers Sugino Zhuang
Liao Leupers ALOMA OFU B&B
Access Sequence
Sub-Sequences
Sub-Layouts
Memory Layout Permutations
Memory Layouts
Compute Overheadfor each layout
via Minimum-Cost Circulation
Distribution ofOverhead values
![Page 56: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/56.jpg)
Efficiency of ARA Algorithms
• For each ARA algorithm, combine with each of the 3 SOA algorithms to generate 3 distributions of overhead values.
• The distributions can be aggregated to form a single distribution.
Leupers Sugino Zhuang
Liao Leupers ALOMA OFU B&B
Access Sequence
Sub-Sequences
Sub-Layouts
Memory Layout Permutations
Memory Layouts
Compute Overheadfor each layout
via Minimum-Cost Circulation
Distribution ofOverhead values
![Page 57: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/57.jpg)
Efficiency of ARA AlgorithmsTestcase: iir_arr_swp -- infinite impulse response filter
Overhead (cycles) Leupers Sugino Zhuang
6 0 0 0
7 2 61 9
8 204 1483 553
9 2089 1018 3408
10 4740 126 5630
11 2565 0 0
12 0 0 0
![Page 58: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/58.jpg)
Efficiency of ARA AlgorithmsTestcase: iir_arr_swp -- infinite impulse response filter
0
1000
2000
3000
4000
5000
6000
6 7 8 9 10 11 12
Overhead (Cycles)
Fre
qu
ency Leupers
Sugino
Zhuang
![Page 59: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/59.jpg)
Evaluating Offset Assignment Algorithms• There is low variability between SOA algorithms -- may
be attributed to small problem sizes.• The choice of ARA algorithm has more impact on
overhead. Much of the variability attributed to the different number of address registers used.
• For all combinations of SOA and ARA algorithms, the permutation of sub-layouts affects the overhead.
![Page 60: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/60.jpg)
Outline
• Background• Traditional Approach to Offset Assignment
• Simple Offset Assignment• Address-Register Assignment
• Improving the Problem Model• Optimal Address-Code Generation• Memory Layout Permutations
• Evaluating Current Heuristics• Methodology• Results
• Conclusions and Future Work
![Page 61: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/61.jpg)
Conclusions
• The objective is to minimize address-computation overhead.
• Given a fixed access sequence and memory layout, the minimum-cost circulation (MCC) technique can minimize overhead.
• Offset assignment algorithms should be evaluated with MCC.
• Offset assignment still has a significant impact on overhead.
• To be effective, current offset assignment algorithms (ARA,SOA) must address the Memory Layout Permutation problem.
![Page 62: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/62.jpg)
Future Work
• A new algorithm is needed to generate memory layouts that will minimize overhead as computed by the Minimum-Cost Flow technique.
• Address-computation overhead must be minimized for loop bodies and for variables that are live between basic blocks and procedures.
![Page 63: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/63.jpg)
References
• Gebotys, C.: DSP address optimization using a minimum cost circulation technique. Proceedings of the 1997 IEEE/ACM International Conference on Computer-Aided Design. 100-103.
• Leupers, R., Marwedel, P.: Algorithms for address assignment in DSP code generation. Proceedins of the 1996 IEEE/ACM International Conference on Computer-Aided Design. 109-112.
• Liao, S., Devadas, S., Keutzer, K., Tjiang, S., Wang, A.: Storage assignment to decrease code size. ACM Transactions of Programming Languages and Systems 18(3) (1996). 235-253.
• Sugino, N., Iimuro, S., Nishihara, A., Jujii, N.: DSP code optimization utilizing memory addressing operation. IEICE Transaction Fundamentals 8 (1996). 1217-1223.
• Zhuang, X., Lau, C., Pande, S.: Storage assignment optimizations through variable coalescence for embedded processors. Proceedings of the 2003 ACM SIGPLAN Conference on Language, Compiler, and Tools for Embedded Systems. 220-231.
• Bartley, D.H.: Optimizing stack frame accesses for processors with restricted addressing modes. Software – Practice & Experience 22(2) (2001). 158-172.
![Page 64: Evaluation of Offset Assignment Heuristics](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c5f550346895da5e553/html5/thumbnails/64.jpg)
Questions?