the microarchitecture of fpga-based soft processors
DESCRIPTION
The Microarchitecture of FPGA-Based Soft Processors. Peter Yiannacouras CARG - June 14, 2005. FPGA vs ASIC Flows. Reduced cost for low-volume Reduced time-to-market Programmability affords customization Designers use FPGAs!. ASIC Flow. FPGA Flow. Circuit Design. Circuit Design. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/1.jpg)
The Microarchitecture of FPGA-Based Soft Processors
Peter Yiannacouras
CARG - June 14, 2005
![Page 2: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/2.jpg)
FPGA vs ASIC Flows
CircuitDesign
ASIC Flow FPGA Flow
CircuitDesign
Reduced cost for low-volume
Reduced time-to-market
Programmability affords customization
Designers use FPGAs!
![Page 3: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/3.jpg)
Processors and FPGAs
Custom Logic Processor
FPGA
Custom Logic Processor
Increased board area, cost, and latency
□ Option 1: Off-chip processor
Custom Logic Processor
FPGA
Specialized part, lack of flexibility
□ Option 2: On-chip “hard” processor
Custom Logic Processor
FPGA
Can implement any number of processors
Tune each one to meet design constraints
□ Option 3: On-chip “soft” processor
Custom Logic Processor
![Page 4: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/4.jpg)
Tuning Processors
Application,Design constraints
• $3• 4 MHz• 800 mW• 2-stage pipeline
• $300• 3.8 GHz• 80 W• 31-stage pipeline
Application,Design constraints• 500 LEs
• 40 MHz• 2-stage pipeline
• 1700 LEs• 160 MHz• 6-stage pipeline
Tuning Soft Processors
Application,Design constraints• 500 LEs
• 40 MHz• 2-stage pipeline
• 1700 LEs• 160 MHz• 6-stage pipeline
• your area, speed, power tradeoff
Automatically Tuning Soft Processors
![Page 5: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/5.jpg)
Understanding Soft Processors Tuning requires
understanding of soft processor design space
We implement many processors and study the design space
ArchitectureDescription
SynthesizedProcessor
• Area• Performance• Power
![Page 6: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/6.jpg)
Don’t we already understand architecture? Not completely
We can evaluate area, power, performance
Not accurately (rules of thumb) FPGA CAD tools are very accurate
Not in the FPGA domain LUTs vs transistors relative speed of RAM & Multipliers
![Page 7: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/7.jpg)
Goals
1. Develop measurement methodology2. Populate the design space3. Compare against industrial soft
processor(s)
![Page 8: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/8.jpg)
Measurement Methodology Require a set of metrics
Area
Performance
Power
FPGA Flow
CircuitDesign (RTL)
• Resource Usage• Clock Frequency• Power estimate
![Page 9: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/9.jpg)
AreaLogic Elements (LEs – LUT & flip flop)
Multipliers
Big RAM
Little RAM
Medium RAM
Measure physical area in Equivalent LEs (Eg. 9-bit multiplier is equivalent to 23 LEs in area)
![Page 10: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/10.jpg)
Performance Wall Clock Time = #Cycles * Clock Period
CAD Tool
dct, golRATEs
bubble_sort, crc, fft, fir, des, quant, iquant, turbo, vlcXirisc
Dhrystone 2.1Freescale
bitcnts, CRC32, sha, stringsearch, FFT, dijkstra, patriciaMiBench
BenchmarkSource
From RTLSimulation,Averaged over 20 benchmarks:
![Page 11: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/11.jpg)
Power CAD tool can estimate power from
assumed toggle ratio (derived experimentally)
Total DynamicPower (mW)
÷ Clock Frequency (MHz)
=Dynamic Energyexcluding I/O per cycle (nJ/cycle)
![Page 12: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/12.jpg)
Metrics summary Require the following information
1. Resource Usage (area – CAD Tool)2. Clock Frequency (wall clock time – CAD Tool)3. Power Estimate (energy/cycle – CAD Tool)4. Cycle Count (wall clock time – RTL Simulator)
![Page 13: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/13.jpg)
RTL-based Design Space Exploration
Complete and accurate understanding of design space
CircuitDesign (RTL)
3. Area4. Clock Frequency5. Power
1. Correctness2. Cycle Count
CADTool
RTLSimulator
Benchmarks
![Page 14: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/14.jpg)
Goals
1. Develop measurement methodology2. Populate the design space3. Compare against industrial soft
processor(s)
![Page 15: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/15.jpg)
Microarchitectural Design Space Exploration
Need fast route to RTL from architectural idea
CircuitDesign (RTL)
3. Area4. Clock Frequency5. Power
1. Correctness2. Cycle Count
CADTool
RTLSimulator
Benchmarks
![Page 16: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/16.jpg)
SPREE (Soft Processor Rapid Exploration Environment)
3. Area4. Clock Frequency5. Power
1. Correctness2. Cycle Count
CAD ToolRTL Simulator
Benchmarks
SPREERTL Generator
![Page 17: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/17.jpg)
Goals
1. Develop measurement methodology2. Populate the design space
1. Rapidly2. With interesting designs3. Accurately (minimize overhead)
3. Compare against industrial soft processor(s)
SPREE
![Page 18: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/18.jpg)
Related Work Parametrized Cores
Narrow design space, laborious changes to control
Architecture Description Languages (ADLs) Too robust, inaccurate (simulator based, or
behavioural RTL) PEAS-III/ASIPMeister [Itoh2000]
non-fpga specific, ISA design focus
![Page 19: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/19.jpg)
SPREE RTL Generator Overview
SPREERTL Generator
ComponentLibrary
ISA Description Datapath Description
EfficientlySynthesizable
RTL
InterestingAllows for interesting architectures
Rapidlysimple descriptions
Accuratelyefficient componentimplementations
![Page 20: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/20.jpg)
Some current limitations No caches (use fast on-chip RAM) Simple in-order issue pipelines No dynamic branch prediction No OS or exceptions support
No ISA changes! Need compiler generation to support Use subset of MIPS-I
![Page 21: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/21.jpg)
Mul
Ifetch Reg File
ALU WriteBack
DataMem
Mul
Ifetch Reg File
ALU WriteBack
DataMem
Architecture Input
Mul
Ifetch Reg File
ALU WriteBack
DataMem
Component Library
![Page 22: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/22.jpg)
Mul
Ifetch Reg File
ALU WriteBack
DataMem
Mul
Ifetch Reg File
ALU WriteBack
DataMem
Architecture Input
Component Library
Mul
Ifetch Regfile
ALU WriteBack
DataMem
Datapath Description
![Page 23: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/23.jpg)
Architecture Input
SPREERTL GeneratorMul
Ifetch Reg File
ALU WriteBack
Mul
Ifetch Reg File
ALU WriteBack
DataMem
Mul
IF
Regfile
ALU WriteBack
Data MemISA Description
Datapath Description
Component Library
Mul
IF
Regfile
ALU WriteBack
Data Mem
Decode Decode Decode
• Control generation savestime and is non-critical
![Page 24: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/24.jpg)
Architecture Input:ISA Description
Generic Operations (GENOPs) MIPS instructions made of GENOPs
FETCH
RFREAD
ADD
RFWRITE
GENOPs MIPS ADD – add rd, rs, rt
FETCH
RFREAD
ADD
RFWRITE
RFREAD
![Page 25: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/25.jpg)
Complete Experimental Framework Using SPREE
3. Area4. Clock Frequency5. Power
1. Correctness2. Cycle Count
CAD ToolRTL Simulator
Benchmarks
SPREERTL Generator
ComponentLibrary
ISA Description Datapath DescriptionFIXED
![Page 26: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/26.jpg)
Goals
1. Develop measurement methodology2. Populate the design space3. Compare against industrial soft processor(s)
SPREE
Area
Performance
Power
![Page 27: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/27.jpg)
Altera’s NiosII Second generation soft processor Has three variations:
NiosIIe – unpipelined, no hardware multiply NiosIIs – 5-stages, no branch prediction NiosIIf – 6-stages, dynamic branch prediction
Caveats Supports exceptions, OS, and caches Very similar but tweaked ISA
![Page 28: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/28.jpg)
Design Space vs NiosII Variations
1000
2000
3000
4000
5000
6000
7000
8000
9000
500 700 900 1100 1300 1500 1700 1900
Area (Equivalent LEs)
Av
era
ge
Wa
ll C
loc
k T
ime
(u
s)
Generated Designs
Altera NiosIIe
Altera NiosIIs
Altera NiosIIf
![Page 29: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/29.jpg)
Summary
1. We span the design space2. Remain competitive
Achieved 9% faster and 11% smaller than NiosIIs
=> don’t suffer from prohibitive overhead
Let’s explore some architecture!
![Page 30: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/30.jpg)
Architectural Axes
1. Hardware vs Software Multiplication2. Shifter implementation3. Pipeline
Depth Organization Forwarding
![Page 31: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/31.jpg)
Hardware vs Software Multiplication
Hardware multiplication Increases area & power consumption Speeds up execution
BUT … Not all applications care about speed Not all applications use multiplication
(significantly)
![Page 32: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/32.jpg)
Cycle Count Speedup of Hardware Multiplication
1.01
1.03
1.04 1.
39
2.72 3.00
4.53
6.94
7.87
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
dijk
stra
dhry
qsor
t
fir
FF
T
dct
quan
t
fft
iqua
nt
Cyc
le C
ou
nt
Sp
eed
up
Must understand its cost/benefit to decide when to use
![Page 33: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/33.jpg)
Cost of Hardware Multiply
0
2000
4000
6000
8000
10000
12000
0 200 400 600 800 1000 1200 1400 1600 1800
Area (Equivalent LEs)
Ave
rag
e W
all C
lock
Tim
e (u
s)
Multiply Full Hardware SupportMultiply Software RoutineAltera NiosIIeAltera NiosIIsAltera NiosIIf
~250 LEs (20%) 35% more Energy/cycle
![Page 34: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/34.jpg)
Shifter Implementations Shifters (multiplexers) are big in FPGAs Consider 3 implementations:
Serial shifter LUT-based barrel shifter Multiplier-based barrel shifter
![Page 35: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/35.jpg)
Impact of Shifter Implementation
Serial
Multiplier-based
LUT-based
1000
1500
2000
2500
3000
3500
4000
4500
5000
800 1000 1200 1400 1600
Area (Equivalent LEs)
Avera
ge W
all C
lock T
ime (
us)
2-stage
3-stage
4-stage
5-stage
7-stage
Consistent across different pipe depths
![Page 36: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/36.jpg)
Shifter Implementation TradeoffsArea Wall Clock Time Energy per Cycle(LEs) (us) (nJ/cycle)
Serial 1035 3458 0.2114Multiplier-based barrel 1102 1945 0.2174LUT-based barrel 1297 1916 0.2409
Averaged over all pipeline depths Smallest: Serial Fastest: LUT-based barrel Energy efficient: Serial
Multiplier is very nice sweet spot
![Page 37: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/37.jpg)
Pipelines - Depth Study different pipeline depths
Over 3 shifters
Arrows = possible forwarding lines (not used)
All use predict not-taken branches
![Page 38: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/38.jpg)
Pipelining & clock frequency
0
20
40
60
80
100
120
Serial Mul-based LUT-based AVERAGE
Fre
qu
ency
(M
Hz) 2-stage
3-stage
4-stage
5-stage
7-stage
![Page 39: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/39.jpg)
Impact of Pipelining
Serial
Multiplier-based
LUT-based
1000
1500
2000
2500
3000
3500
4000
4500
5000
800 1000 1200 1400 1600
Area (Equivalent LEs)
Avera
ge W
all C
lock T
ime (
us)
2-stage
3-stage
4-stage
5-stage
7-stage
Adds area, can increase speed (2 to 3 stage?)
![Page 40: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/40.jpg)
Mul
FPGA Nuance: Synchronous RAMs 2-stage Pipeline
Ifetch Regfile
ALU WriteBack
DataMem
Stall on all loads, and any operand fetches
![Page 41: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/41.jpg)
Mul
3-stage Pipeline
Ifetch Regfile
ALU WriteBack
DataMem
Less stalls, increased frequency => Big speedup (1.7x)
![Page 42: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/42.jpg)
3, 4 and 5 stage pipelines Increased area, small change in performance
=> Deeper pipelines have potential for better speedups
Serial
Multiplier-based
LUT-based
1000
1500
2000
2500
3000
3500
4000
4500
5000
800 1000 1200 1400 1600
Area (Equivalent LEs)
Avera
ge W
all C
lock T
ime (
us)
2-stage
3-stage
4-stage
5-stage
7-stage
![Page 43: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/43.jpg)
The 7-stage Pipeline Where Branch Delay Slots break down
The ideal case:
BEQOR JR ADDXX Neversquashthisstage
…
![Page 44: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/44.jpg)
Problem: Separation of Branch and Branch Delay Slot
BEQADDJR
Stalls onRAW hazard
…
![Page 45: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/45.jpg)
Problem: Separation of Branch and Branch Delay Slot
BEQADDJR NOPX Must track and protect delay slots
…
![Page 46: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/46.jpg)
Multiple Delay Slots
Must detect separation of branch from delay slot
OR prevent multiple delay slots Stall branch if a delay slot exists in the pipe We did this one (+30LEs, -15% clock frequency)
BEQOR JR ADD
Can’t guard all delay slots
Better off eliminating delay slots – currently researching
…
![Page 47: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/47.jpg)
Pipeline organization Where stages are placed is important Pipe stage placement can
Result in all around “win/loss” Present a tradeoff
LUT-basedMul-based
Serial
0
500
1000
1500
2000
2500
3000
3500
4000
800 900 1000 1100 1200 1300 1400
Area (LEs)
Wa
ll C
loc
k T
ime
(u
s)
4-Stage (H)
4-Stage (B)
![Page 48: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/48.jpg)
Forwarding SPREE supports stage to stage forwarding
Mul
IfetchRegFile ALU Write
Back
DataMem
Forward line rs
Forward line rt
![Page 49: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/49.jpg)
Effect of Forwarding
no forwarding
forward rt
forward rs
forward rs&rt
1000
1100
1200
1300
1400
1500
1600
1700
1800
1900
800 900 1000 1100 1200 1300 1400 1500 1600
Area (Equivalent LEs)
Ave
rag
e W
all
Clo
ck T
ime
(us)
3-stage
4-stage
5-stage
20% speed increase
![Page 50: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/50.jpg)
An Aside: ISA Subsetting Applications don’t generally use all
instructionsISA Usage In Each Benchmark
0.00%
50.00%
100.00%
bubble
_sort
crc
des
fft
fir
quant
iquant
turb
o
vlc
bitcnts
CR
C32
qsort
sha
str
ingsearc
h
FF
T
dijkstr
a
patr
icia
gol
dct
dhry
AV
ER
AG
E
![Page 51: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/51.jpg)
Processor reduction Can strip away unused
components/control Generator supports instruction disabling
Automatically strips away unused components Create an Application Specific processor Do this for each benchmark
FPGAs are a good platform for this!
![Page 52: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/52.jpg)
Area of a Subsetted Processor
Area Measurements for a Processor Subsetted Over Benchmark Set
0
200
400
600
800
1000
1200
1400
OR
IGIN
AL
bu
bb
le_
sort
crc
de
s fft fir
qu
an
t
iqu
an
t
turb
o
vlc
bitc
nts
CR
C3
2
qso
rt
sha
stri
ng
sea
rch
FF
T
dijk
stra
pa
tric
ia
go
l
dct
dh
ry
AV
ER
AG
E
Processor
Are
a (
LE
s)
![Page 53: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/53.jpg)
Speed of a Subsetted Processor
Fmax Measurements for a Processor Subsetted Over Benchmark Set
50.00
52.00
54.00
56.00
58.00
60.00
62.00
64.00
66.00
68.00
70.00
cycl
es
bubb
le_s
ort
crc
des fft fir
quan
t
iqua
nt
turb
o
vlc
bitc
nts
CR
C32
qsor
t
sha
strin
gsea
rch
FF
T
dijk
stra
patr
icia go
l
dct
dhry
AV
ER
AG
E
Processor
Fm
ax (
MH
z)
`
![Page 54: The Microarchitecture of FPGA-Based Soft Processors](https://reader036.vdocument.in/reader036/viewer/2022062804/56814b8d550346895db87208/html5/thumbnails/54.jpg)
Conclusion Understanding architectural trade-offs
=> Maximize efficiency Developed SPREE & measurement
methodology Performed preliminary architectural study
Quantified cost of hardware multiplication Explored shift unit implementations Explored pipelines: depth, organization,
forwarding