a decompression architecture for low power embedded systems lekatsas, h.; henkel, j.; wolf, w.;...
TRANSCRIPT
A Decompression Architecturefor Low Power Embedded
Systems
Lekatsas, H.; Henkel, J.; Wolf, W.;
Computer Design, 2000. Proceedings. 2000 International Conference on 2000 IEEE
Yi-hsin Tseng
Date: 11/06/2007
Outline
Introduction & motivationCode Compression ArchitectureDecompression Engine DesignExperimental resultsConclusion & Contributions of the paper Our projectRelate to CSE520Q & A
For Embedded system
More complicated architecture in embedded system nowadays.
Available memory space is smaller.A reduced executable program can also
indirectly affect the chip on… Size Weight Power consumption
Why code compression/decompression?
Compress the instruction segment of the executable running on the embedded system… Reducing the memory requirements and bus tra
nsaction overheads
Compression Decompression
Related work on compressed instructions
A logarithmic-based compression scheme where 32-bit instructions map to fixed but smaller width compressed instructions. (The system using memory area only)
Frequently appearing instructions are compressed to 8 bits. (fixed-length 8 or 32 bits)
The compressed method in this paper
Give comprehensive results for the whole system including buses memories (main memory and cache) decompression unit CPU
Architecture in this system (Post-cache)
Reason ?
-Increase the effective cache size
-Improve instruction bandwidth
Code Compression Architecture
Use SAMC to compress instructions (Semiadaptive Markov Compression)
Divide instructions into 4 groups based on SPARC architecture appended a short code (3-bit) in the beginning o
f each compressed instruction
4 Groups of Instructions
Group 1 instructions with immediates
Ex: sub %i1, 2, %g3 ; set 5000, %g2
Group 2 branch instructions
Ex: be, bne, bl, bg, ...
Group 3 instructions with no immediates
Ex: add %o1,%o2,%g3 ; st %g1,[%o2]
Group 4 Instructions that are left uncompressed
The Key idea is….
Present an architecture for embedded systems that decompresses offline-compressed instructions during runtime to reduce the power consumption a performance improvement (in most cases)
Pipelined Design – group 1 (stage 1)Index the
Dec. Table
Input Compressed Instructions
Forward instructions
Pipelined Design – group 3instructions with no immediates (stage 1)
256 entry table
No immediate instructions may appear in pairs.
-> compressed in one byte. (<-> 64 bits)
8 bits as index to address
Experimental results
Use different applications: an algorithm for computing 3D vectors for a mot
ion picture ("i3d“) a complete MPEGII encoder ("mpeg ") a smoothing algorithm for digital images ("smo") a trick animation algorithm ("trick")
A simulation tool written in C for obtaining performance data for the decompression engine
Experimental results (con’t)
The decompression engine is application specific. for each application -- build a decoding table
and a fast dictionary table that will decompress that particular application only.
Worse performance on smo 512-byte instruction cache?
- Do not require large memory. (Execute in tight loops)
- Generates very few misses for this cache size. (So the compressed architecture therefore does not help an already almost perfect hit ratio and the slowdown by the decompression engine prevails)
Conclusion & Contributions of the paper
This paper designed an instruction decompression engine as a soft IP core for low power embedded systems.
Applications run faster as opposed to systems with no code compression (due to improved cache performance).
Lower power consumption (due to smaller memory requirements for the executable program and smaller number of memory accesses)
Relate to CSE520
Implement the system performance and power consumption by using Pipeline Architecture in system.
A different architecture design for lower power consumption on the Embedded system.
Smaller cache size perform better on compressed architecture ; larger cache perform better on no-compressed architecture. Cache hit ratio
Our project
Goal: How to improve the efficiency of power management i
n embedded multicore system Idea:
Use different power mode within a given power budget, global power management policy (In Jun Shen’s presentation)
Use the SAMC algorithm and this decompress architecture as another factor to simulate (This paper)
How? SimpleScalar tool set
try simple function at first, then try the different power mode