![Page 1: VLIW DSP Processor Design for Mobile Communication … · 2013. 1. 9. · Embedded processing of lower communication layers. Characteristica. Mix of traditional loop-centric DSP algorithms](https://reader035.vdocument.in/reader035/viewer/2022063016/5fd51cc6d369052f523679e2/html5/thumbnails/1.jpg)
VLIW DSP Processor Design forMobile Communication Applications
Contents crafted byDr. Christian Panis
Catena Radio Design
![Page 2: VLIW DSP Processor Design for Mobile Communication … · 2013. 1. 9. · Embedded processing of lower communication layers. Characteristica. Mix of traditional loop-centric DSP algorithms](https://reader035.vdocument.in/reader035/viewer/2022063016/5fd51cc6d369052f523679e2/html5/thumbnails/2.jpg)
Agenda
Trends in mobile communication
Architectural core features withsignificant impact on performance
Case study: 3a – a scalable VLIW architecture
Design space exploration
Challenges of scalability
Summary
![Page 3: VLIW DSP Processor Design for Mobile Communication … · 2013. 1. 9. · Embedded processing of lower communication layers. Characteristica. Mix of traditional loop-centric DSP algorithms](https://reader035.vdocument.in/reader035/viewer/2022063016/5fd51cc6d369052f523679e2/html5/thumbnails/3.jpg)
Trends in Mobile Communication
IEEE Spectrum, July 2004
![Page 4: VLIW DSP Processor Design for Mobile Communication … · 2013. 1. 9. · Embedded processing of lower communication layers. Characteristica. Mix of traditional loop-centric DSP algorithms](https://reader035.vdocument.in/reader035/viewer/2022063016/5fd51cc6d369052f523679e2/html5/thumbnails/4.jpg)
Trends in Mobile Communication
Embedded systems emerging increasingly Bandwidth demands leads to significant
increase in computational requirements Trade-off:
power dissipation vs. flexibility vs. performance
Cost pressure, feature size, application spaceMultistandard solutions
Application-specific and customizable processors
![Page 5: VLIW DSP Processor Design for Mobile Communication … · 2013. 1. 9. · Embedded processing of lower communication layers. Characteristica. Mix of traditional loop-centric DSP algorithms](https://reader035.vdocument.in/reader035/viewer/2022063016/5fd51cc6d369052f523679e2/html5/thumbnails/5.jpg)
How to Tackle the Problem?
Application specific processor architectures provides support for application specific requirements provides domain specific problem solutions provides trade-off power vs. area effort vs. flexibility
Domain Specific Processor Architectures
Things to be considered For each core a seperate tooling/tool-chain? How to analyse the application specific requirements? How to analyse the gain compared with a standard core solution? How to deal with additional verification effort caused by flexibility?
![Page 6: VLIW DSP Processor Design for Mobile Communication … · 2013. 1. 9. · Embedded processing of lower communication layers. Characteristica. Mix of traditional loop-centric DSP algorithms](https://reader035.vdocument.in/reader035/viewer/2022063016/5fd51cc6d369052f523679e2/html5/thumbnails/6.jpg)
Focus / Why VLIW ?
FocusEmbedded processing of lower communication layers
CharacteristicaMix of traditional loop-centric DSP algorithms with control code
load/store VLIW is one possible solution Real time requirements for signal processing algorithms (+) High ILP support allows efficient execution of inner loops (+) Code density drawback of VLIW (-) Poor cache support (+/-)
![Page 7: VLIW DSP Processor Design for Mobile Communication … · 2013. 1. 9. · Embedded processing of lower communication layers. Characteristica. Mix of traditional loop-centric DSP algorithms](https://reader035.vdocument.in/reader035/viewer/2022063016/5fd51cc6d369052f523679e2/html5/thumbnails/7.jpg)
Architectural Key Characteristika
Register filesize, number of entries, structure
Data path(s)number, parallel availability, type
Memory ports / bandwidthnumber, data width, supported granularity
ISA, binary codinginstruction mapping, binary coding, native instruction word size
Pipeline structurenumber of stages, exceptions
![Page 8: VLIW DSP Processor Design for Mobile Communication … · 2013. 1. 9. · Embedded processing of lower communication layers. Characteristica. Mix of traditional loop-centric DSP algorithms](https://reader035.vdocument.in/reader035/viewer/2022063016/5fd51cc6d369052f523679e2/html5/thumbnails/8.jpg)
Architectural Key Characteristika
Register filesize, number of entries, structure
d0d1
d3 d2
d29 d28
d31 d30
a0/l0
a1/l1
a14/l14
a15/l15
d0d1
d3 d2
a0/l0
a1/l1
a14/l14
a15/l15
D [0..31]L [0..15]A [0..15]
D [0..15]L [0..15]A [0..15]
![Page 9: VLIW DSP Processor Design for Mobile Communication … · 2013. 1. 9. · Embedded processing of lower communication layers. Characteristica. Mix of traditional loop-centric DSP algorithms](https://reader035.vdocument.in/reader035/viewer/2022063016/5fd51cc6d369052f523679e2/html5/thumbnails/9.jpg)
Architectural Key Characteristika
Data path(s)number, parallel availability, type of supported functions
SIMD MUL
op1 op2
X
res 1 op3
+/ALU
res 3
op4 op5
X
res 2 op6
+/ALU
res 4
SIMD MUL
op1 op2
X
res 1 op3
+
res 3
op4 op5
X
res 2 op6
+
res 4
op1 op2 op3
ALU
res 5
op4 op5 op6
ALU
res 6
![Page 10: VLIW DSP Processor Design for Mobile Communication … · 2013. 1. 9. · Embedded processing of lower communication layers. Characteristica. Mix of traditional loop-centric DSP algorithms](https://reader035.vdocument.in/reader035/viewer/2022063016/5fd51cc6d369052f523679e2/html5/thumbnails/10.jpg)
Architectural Key Characteristika
Memory ports / bandwidthnumber, data width, supported granularity
AGU 1 AGU 2
PORT 1 PORT 2
AGU 1 AGU 2
PORT 1 PORT 2
![Page 11: VLIW DSP Processor Design for Mobile Communication … · 2013. 1. 9. · Embedded processing of lower communication layers. Characteristica. Mix of traditional loop-centric DSP algorithms](https://reader035.vdocument.in/reader035/viewer/2022063016/5fd51cc6d369052f523679e2/html5/thumbnails/11.jpg)
Architectural Key Characteristika
ISA, binary codinginstruction mapping, binary coding, native instruction word size
add op1, op2add op1, op2, op3
sub op1, op2sub op1, op2, op3
…
ISA
20 bits 20 bits 16 bits 16 bits
16 bits
number instructions
long instructionsbytes
710164
2185
710215
1850
20 bits 16 bits
16 bits 16 bits
![Page 12: VLIW DSP Processor Design for Mobile Communication … · 2013. 1. 9. · Embedded processing of lower communication layers. Characteristica. Mix of traditional loop-centric DSP algorithms](https://reader035.vdocument.in/reader035/viewer/2022063016/5fd51cc6d369052f523679e2/html5/thumbnails/12.jpg)
Architectural Key Characteristika
Pipeline structurenumber of stages, exceptions
Load/store operation write backread op1 read op2
Load/store operation accu write back 1read op2b
read op1 read op2a write back 2
address calculation
address calculation 1
address calculation 2
address register update
Instruction Fetch Alignment Instruction
Decode Execute 1 Execute 2
Instruction Fetch Alignment Instruction
Decode Execute 1 Execute 2 Execute 3
![Page 13: VLIW DSP Processor Design for Mobile Communication … · 2013. 1. 9. · Embedded processing of lower communication layers. Characteristica. Mix of traditional loop-centric DSP algorithms](https://reader035.vdocument.in/reader035/viewer/2022063016/5fd51cc6d369052f523679e2/html5/thumbnails/13.jpg)
3a: case study
Key architectural aspects Modified Dual Harvard load-store architecture RISC instruction set xLIW (scalable long instruction word) Orthogonal register file and ISA Destination register based predicated execution Instruction buffer for power efficient inner loop processing Scalable and configurable core architecture Architecture specified considering an optimizing C-compiler
![Page 14: VLIW DSP Processor Design for Mobile Communication … · 2013. 1. 9. · Embedded processing of lower communication layers. Characteristica. Mix of traditional loop-centric DSP algorithms](https://reader035.vdocument.in/reader035/viewer/2022063016/5fd51cc6d369052f523679e2/html5/thumbnails/14.jpg)
3a: case study
C-compiler aspects Load/store architecture Orthogonal ISA Large uniform register files Functionality stored in ISA Simple issue rules
Mode dependent instructions Irregular instructions Implicit dependencies Modes for instruction sets
![Page 15: VLIW DSP Processor Design for Mobile Communication … · 2013. 1. 9. · Embedded processing of lower communication layers. Characteristica. Mix of traditional loop-centric DSP algorithms](https://reader035.vdocument.in/reader035/viewer/2022063016/5fd51cc6d369052f523679e2/html5/thumbnails/15.jpg)
3a: case study
xLIW –scalable long instruction word
align unitprogram memory
inst0
decoder ports
inst1 inst2 inst3inst4 inst5 inst6 inst7inst8 inst9 inst10 inst11
inst12 inst13 inst14 inst15inst16 inst17 inst18 inst19
inst n-3 inst n-2 inst n-1 inst n
inst0 inst1
LD/ST LD/ST CMP CMP PSEQ
inst2
inst3 inst4
inst5
inst6 inst7
cycle m
cycle m+1
cycle m+2
cycle m+3
cycle m+4
![Page 16: VLIW DSP Processor Design for Mobile Communication … · 2013. 1. 9. · Embedded processing of lower communication layers. Characteristica. Mix of traditional loop-centric DSP algorithms](https://reader035.vdocument.in/reader035/viewer/2022063016/5fd51cc6d369052f523679e2/html5/thumbnails/16.jpg)
3a: case study
Destination register based predicated execution
load/store load/store arithmetic arithmetic predicated execution
flag register file
![Page 17: VLIW DSP Processor Design for Mobile Communication … · 2013. 1. 9. · Embedded processing of lower communication layers. Characteristica. Mix of traditional loop-centric DSP algorithms](https://reader035.vdocument.in/reader035/viewer/2022063016/5fd51cc6d369052f523679e2/html5/thumbnails/17.jpg)
3a: case study
Orthogonal register file incl. flag register file
data address flag
register file
gb d1 d0
l0
a0
data register
long register
accumulator register
r0m0 address register
modifier register
![Page 18: VLIW DSP Processor Design for Mobile Communication … · 2013. 1. 9. · Embedded processing of lower communication layers. Characteristica. Mix of traditional loop-centric DSP algorithms](https://reader035.vdocument.in/reader035/viewer/2022063016/5fd51cc6d369052f523679e2/html5/thumbnails/18.jpg)
3a: case study
”3-phase” pipeline
Instruction Fetch Alignment Instruction
Decode Execute 1 Execute 2
Phase 1: fetch Phase 3: executePhase 2: decode
![Page 19: VLIW DSP Processor Design for Mobile Communication … · 2013. 1. 9. · Embedded processing of lower communication layers. Characteristica. Mix of traditional loop-centric DSP algorithms](https://reader035.vdocument.in/reader035/viewer/2022063016/5fd51cc6d369052f523679e2/html5/thumbnails/19.jpg)
3a: case study
Things to be considered For each core a seperate tooling/tool chain? How to analyse the application specific requirements? How to deal with additional verification effort caused by flexibility? How to analyse the gain compared with a standard core solution?
Scaleable core architecture – Adaptable to application specific requirements Scalable in performance
”one core” – ”one tool chain”
![Page 20: VLIW DSP Processor Design for Mobile Communication … · 2013. 1. 9. · Embedded processing of lower communication layers. Characteristica. Mix of traditional loop-centric DSP algorithms](https://reader035.vdocument.in/reader035/viewer/2022063016/5fd51cc6d369052f523679e2/html5/thumbnails/20.jpg)
Design Space Exploration – Design Flow
functional testing
functional testing
hardware generators
hardware generators
configuration nconfiguration 2
optimizing C-compiler
application C-code
assemblerlinker
ISS
static analysis results
dynamic analysis results
verification report
compiler generator
configuration 1
testcase generator
documentationgenerator
Evaluation Phase
Production Phasebincode
generatorchosen core configuration
documentationgenerator
hardware generators
optimizing C-compiler
assemblerlinker
ISS
static analysis results
dynamic analysis results
verification report
compiler generator
testcase generator
application C-code
binary executable
![Page 21: VLIW DSP Processor Design for Mobile Communication … · 2013. 1. 9. · Embedded processing of lower communication layers. Characteristica. Mix of traditional loop-centric DSP algorithms](https://reader035.vdocument.in/reader035/viewer/2022063016/5fd51cc6d369052f523679e2/html5/thumbnails/21.jpg)
Design Space Exploration – Static Analysis
Code SizeMeasure how efficient the application can be mapped on aprocessor in term of required code space
ParallelismMeasure how efficient the application code can be mappedon a parallel architecture
Instruction histogramMeasure how frequent instructions are used duringmapping of the application code onto the chosen ISA
![Page 22: VLIW DSP Processor Design for Mobile Communication … · 2013. 1. 9. · Embedded processing of lower communication layers. Characteristica. Mix of traditional loop-centric DSP algorithms](https://reader035.vdocument.in/reader035/viewer/2022063016/5fd51cc6d369052f523679e2/html5/thumbnails/22.jpg)
Design Space Exploration – Dynamic Analysis
Program memory fetchMeasure of efficient use of memory fetches, mainly influenced bypipelined processors and application code with low branchdistance and high branch frequency
Execution count per bundleMeasure how often a certain execution bundle will be executed
Execution count per instructionMeasure how frequent a certain instruction will be executed
![Page 23: VLIW DSP Processor Design for Mobile Communication … · 2013. 1. 9. · Embedded processing of lower communication layers. Characteristica. Mix of traditional loop-centric DSP algorithms](https://reader035.vdocument.in/reader035/viewer/2022063016/5fd51cc6d369052f523679e2/html5/thumbnails/23.jpg)
Design Space Exploration – Example
![Page 24: VLIW DSP Processor Design for Mobile Communication … · 2013. 1. 9. · Embedded processing of lower communication layers. Characteristica. Mix of traditional loop-centric DSP algorithms](https://reader035.vdocument.in/reader035/viewer/2022063016/5fd51cc6d369052f523679e2/html5/thumbnails/24.jpg)
Design Space Exploration – Statistics
![Page 25: VLIW DSP Processor Design for Mobile Communication … · 2013. 1. 9. · Embedded processing of lower communication layers. Characteristica. Mix of traditional loop-centric DSP algorithms](https://reader035.vdocument.in/reader035/viewer/2022063016/5fd51cc6d369052f523679e2/html5/thumbnails/25.jpg)
3a: case study
Things to be considered For each core a seperate tooling/tool chain? How to analyse the application specific requirements? How to analyse the gain compared with a standard core solution? How to deal with additional verification effort caused by flexibility?
Application code analysis Detailed static & dynamic analysis Quantitative analysis of different core based platforms Balance different core features against area/power consumption Quantitative support to optimize HW/SW partitioning Identify ”Hot Spots”
![Page 26: VLIW DSP Processor Design for Mobile Communication … · 2013. 1. 9. · Embedded processing of lower communication layers. Characteristica. Mix of traditional loop-centric DSP algorithms](https://reader035.vdocument.in/reader035/viewer/2022063016/5fd51cc6d369052f523679e2/html5/thumbnails/26.jpg)
3a: case study
Benchmarking?
Does the application requirements fit to standard benchmarks?
Application Benchmarking:Benchmarking of theTarget Architecture
![Page 27: VLIW DSP Processor Design for Mobile Communication … · 2013. 1. 9. · Embedded processing of lower communication layers. Characteristica. Mix of traditional loop-centric DSP algorithms](https://reader035.vdocument.in/reader035/viewer/2022063016/5fd51cc6d369052f523679e2/html5/thumbnails/27.jpg)
Design Space Exploration
![Page 28: VLIW DSP Processor Design for Mobile Communication … · 2013. 1. 9. · Embedded processing of lower communication layers. Characteristica. Mix of traditional loop-centric DSP algorithms](https://reader035.vdocument.in/reader035/viewer/2022063016/5fd51cc6d369052f523679e2/html5/thumbnails/28.jpg)
Design Space Exploration
![Page 29: VLIW DSP Processor Design for Mobile Communication … · 2013. 1. 9. · Embedded processing of lower communication layers. Characteristica. Mix of traditional loop-centric DSP algorithms](https://reader035.vdocument.in/reader035/viewer/2022063016/5fd51cc6d369052f523679e2/html5/thumbnails/29.jpg)
Design Space Exploration
Things to be considered For each core a seperate tooling/tool chain? How to analyse the application specific requirements? How to analyse the gain compared with a standard core solution? How to deal with additional verification effort caused by flexibility?
Analysis application code on ”function” level Compare MIPS/Memory requirements for different application
setup’s and for different core architecture Benchmarking for the target architecture
![Page 30: VLIW DSP Processor Design for Mobile Communication … · 2013. 1. 9. · Embedded processing of lower communication layers. Characteristica. Mix of traditional loop-centric DSP algorithms](https://reader035.vdocument.in/reader035/viewer/2022063016/5fd51cc6d369052f523679e2/html5/thumbnails/30.jpg)
Challenges of scalability
Verification effort versus Flexibility
XML based configuration file
Binary code generator
Documentation generator/adaptation
HW code generator
Testcase generator
![Page 31: VLIW DSP Processor Design for Mobile Communication … · 2013. 1. 9. · Embedded processing of lower communication layers. Characteristica. Mix of traditional loop-centric DSP algorithms](https://reader035.vdocument.in/reader035/viewer/2022063016/5fd51cc6d369052f523679e2/html5/thumbnails/31.jpg)
Summary
Application Specific Processors allows to meetarea and power dissipation requirements inSoC’s for mobile communication platforms
Multistandard requirement leads todomain specific processor architectures
”one core” – ”one tool chain”
Design Space Exploration is required to analysedomain specific requirements on core subsystem