quantitative approach to isa design and compilation for code size reduction simplight...
Post on 20-Dec-2015
216 Views
Preview:
TRANSCRIPT
Quantitative approach to ISA design and compilation for code size reduction
SimpLight Nanoelectronics LtdKevin Lo, Lin Ma
SimpLight Confidential
Outline
Introduction Problem definition Existing approach Our approach
• Hardware support• compiler support
Experimental Results Summary
SimpLight Confidential
Introduction
Code Size is a critical issue for Embedded Applications.
Mixed size Instruction Set is commonly used • Normal length instruction set• Compressed length instruction set
SimpLight Confidential
Problem definition
Trade off between• Maximum code size compression ratio.• Least degradation in performance.• Implementation cost
SimpLight Confidential
Existing approach
Scheme Methodology Decoding Compression ratio
Performance Penalty
Hardware Cost
Compiler complexity
ARM -ThumbExtended ISA + mode Switching
Instruction mapping
20-30%Very High
Thumb Engine Low
ARM - Thumb-2
Separated ISA with Mapping Engine
Instruction mapping
15-25%
HighThumb-2 instruction mapping Engine
High
MIPS - MIPS16e
Extended ISA + mode Switching
Native support
20-30%Very High
Special branch detection engine
Low
IBM- CodePack
Binary Compression via software engine
Build-in de-compressor Engine
20-30%
Negligible
Hardware de-compressor
No effort
ARC- ARCompact
16-bit instruction support via User defined interface
Native support
20-40%
NegligibleComplex reconfigurable processor
Low
SimpLight Confidential
Instruction Analysis
Real applications• Uclibc: Open source c-library package for embedded applications • 729a:Voice codec program used in mobile phones • Mpeg4: MPEG-4/ASP decoder program
• Nucleus: RTOS for embedded processors (ported to the Mips-like architecture)
• Libmad: MPEG audio decoder library
• Uclinux: Linux kernel release 2.6.xx for embedded processors • Lay2/3: Layer 2 and Layer 3 of the GSM wireless communication
protocol stack
SimpLight Confidential
Instruction Analysis
32-bit instruction format
5-bit6-bit 5-bit 5-bit3 GPRs
6-bit 5-bit 5-bit2 GPRs
6-bit 26-bitIndex26
6-bit 5-bit 5-bit 16-bit immediate2 GPRs + imm16
5-bit6-bit 5-bit imm52 GPRs + imm5
No Operands 6-bit
SimpLight Confidential
3% use 16-bit or less
16-bit instruction format
6-bit 5-bit 5-bit2 GPRs
No Operands 6-bit
SimpLight Confidential
8% result == one operand
16-bit instruction format
5-bit6-bit 5-bit 5-bit3 GPRs
5-bit6-bit 5-bit imm52 GPRs + imm5
SimpLight Confidential
20.9% use $0
16-bit instruction format
5-bit6-bit 5-bit 5-bit3 GPRs
5-bit6-bit 5-bit imm52 GPRs + imm5
SimpLight Confidential
21.3% use stack pointer
16-bit instruction format
6-bit 5-bit 5-bit 16-bit immediate2 GPRs + imm16
SimpLight Confidential
4.4% use immediate 0
16-bit instruction format
5-bit6-bit 5-bit imm52 GPRs + imm5
6-bit 5-bit 5-bit 16-bit immediate2 GPRs + imm16
SimpLight Confidential
Total 47.47% could only use 16-bit
16-bit instruction format
5-bit6-bit 5-bit 5-bit3 GPRs
6-bit 5-bit 5-bit2 GPRs
6-bit 5-bit 5-bit 16-bit immediate2 GPRs + imm16
5-bit6-bit 5-bit imm52 GPRs + imm5
No Operands 6-bit
imm5
Maximum 24% code size reduction ratio
SimpLight Confidential
Hardware support
Decoding phase-instruction fetching handler• For word aligned 16-bit instruction: same as
32-bit• For half-word aligned 16-bit instruction: shift
to form a word alignment 32-bit
FetchWord Aligned PC
Instruction Buffer
Execution Unit
16-bit instruction
16-bit instruction
Decoded 32-bit instruction
Upper half word will be discarded
FetchHalf-word Aligned PC
Instruction Buffer
Execution Unit
16-bit instruction
16-bit instruction
Decoded 32-bit instruction
Upper half word will be shifted to be word aligned
SimpLight Confidential
Compiler support
Assembly
Instruction Selection - tag
Register Allocation
Instruction replacement
Code Emitting
Normal Instruction Set
Analyze OP to tag candidates
Schedule for max paired 16-bit OPs
Mixed instruction
Scheduling for code size
SimpLight Confidential
Scheduling for code size
Object• Get more paired 16-bit OPs
Heuristic for code size purpose• At step t, data ready instructions in a priority li
st { OP16_1t, OP32_2t … }, OP16_1t is a best candidate if satisfies: OP16_it is the only candidate More than One 16-bit instructions in the priority list At step t-1, an unpaired OP16_jt-1 was issued.
use original scheduling policy to loop body BB to minimize performance degradation
SimpLight Confidential
Experiments result
Code size compression ratio• About 17%-23%
Scheduling policy perf: original size: code size to all BBs perf&size: code size to non loop body BBs
0. 00%
5. 00%
10. 00%
15. 00%
20. 00%
25. 00%
reduct i on reduct i on reduct i on
Mi xed- perf Mi xed- si ze Mi xed- p&s
Ucl i bcmp3mpeg4729anucl eus- demoL2/ L3Li nux
perf size perf & size
SimpLight Confidential
Experiments result
• Performance– slight improvement to performance fro
m 0.6% to 4.6%
0. 92
0. 93
0. 94
0. 95
0. 96
0. 97
0. 98
0. 99
1
1. 01
mp3 mpeg4 729a
normalized cycles
i nst r32
per f
si ze
per f &si ze
SimpLight Confidential
Summary
A simple but effective approach to implement code size reduction• Focus on instruction analysis with real
applications• Using all registers• Scheduling policy for code size purpose• Little cost in hardware
About 17%-23% compression ratio and slight improvement to performance from 0.6% to 4.6%
SimpLight Confidential
Thank you !
top related