crosscutting issues: the rôle of compilers architects must be aware of current compiler technology...
TRANSCRIPT
Crosscutting Issues: The Rôle of Compilers
• Architects must be aware of current compiler technology
Compiler
Architecture
Modern Compilers
Front End
High-level Optimisations
Global Optimiser
Code Generator
E.g. procedure inlining,loop transformations
Register allocation
Machine dependentoptimisations
Compiler Technology
• Multiple passes complicate matters– E.g. common subexpression elimination must
assume that a register will be allocated for the temporary value
– E.g. Procedure inlining before size is known
• Register allocation is critical– Uses graph colouring techniques– Requires at least 16 registers to be effective
Architectural Issues
• How are variables allocated and addressed?– Stack: local variables, scalars– Global data area: global variables, constants,
arrays– Heap: dynamic objects, not scalars
• How many registers are needed?– Integer: 26 registers– FP: 20 registers
Aiding Compiler Writers
• Architectures should:– Be regular (orthogonal instruction set)– Provide primitives, not solutions– Simplify trade-offs among alternatives– Not require run-time interpretation of data
known at compile-time• VAX CALLS
Keep it simple!
Compiler Support for Multimedia Instructions
• SIMD instructions act on multiple smaller data items in a large “word”– Solutions, not primitives!– Too few registers!– Data types not found in programming
languages!
Result: Only used by low-level graphics libraries.
Multimedia Instructions
• These SIMD instructions act like a “mini-vector” architecture– E.g. MMX in 64 bits
• 8 × 8-bit vectors
• 4 × 16-bit vectors
• 2 × 32-bit vectors
– SSE: 128 bits– Much more limited than genuine vector
processors
Putting It All Together: MIPS• 64-bit load/store design
• RISC features:– GPR, load-store architecture– Small, simple instruction set– Designed for efficient pipelining (fixed length
instructions)– Efficient compiler target
MIPS
• 32 64-bit integer registers– R0…R31– R0 fixed: 0
• 32 64-bit or 32-bit floating point registers– Supports “paired single” operations
MIPS Data Types
• Integer:– Bytes, 16-bit halfwords, 32-bit words, 64-bit
double words• Operations are all 64-bit
• Floating point:– 32-bit and 64-bit
MIPS Addressing Modes
• Only immediate and displacement– 16-bit displacements/immediates– Register-indirect: set displacement = 0– 16-bit absolute: use R0
• Byte addressable with 64-bit addresses
• Big-endian or little-endian
• Alignment required
MIPS Instructions
• Three instruction formats:
opcode rs rt immediate
6 5 5 16
I-type
opcode offset
6 26
J-type
opcode rs rt shamt
6 5 5 5
R-type rd
5
funct
6
MIPS Operations• Load-store• ALU operations
– Add, subtract, multiply, divide, and, or, xor, LUI (load upper immediate), shifts
• Control transfer– Set conditions– Branch (reg=0, reg0, reg1=reg2, reg1reg2),
jump, jump-and-link (call)– Conditional move
• Floating point– Paired single operations– Multiply-add (DSP)
MIPS: Instruction Usage
• Integer applications:– Load, add, branch, store,
or, compare
• FP applications:– Add (int), load (int), load,
multiply, add, store
Figure 2.34.
Another View: Trimedia Media Processor
• Embedded processor for multimedia applications– E.g. set-top boxes (decoders, etc.) and TVs
• Very different architecture– 128 32-bit registers (FP or int)– Partitioned (SIMD) instructions– 2’s complement and saturating arithmetic– VLIW architecture
Trimedia: VLIW Approach
• Compiler can group up to five instructions for simultaneous execution– Must be independent– Use NOPs if there are insufficient independent
instructions• Large program size
• Trimedia uses memory compression
• Programs are 2-3 times larger than MIPS (even with compression)!
Fallacies and Pitfalls
• Pitfall: Designing a “high-level” instruction set to support HLL’s– Seldom provide an exact match– Often too general (VAX CALLS)
Fallacies and Pitfalls
• Fallacy: There is such a thing as a typical program– Programs vary very significantly
• Pitfall: Designing an architecture to reduce code size without considering compilers– Compilers have much greater impact on code
size– Start with densest compiled code
Fallacies and Pitfalls
• Pitfall: Expecting good compiled performance for DSPs– Hand-tuned assembler is faster and more
compact
• Fallacy: An architecture without flaws cannot be successful– 80x86!
• Segments, accumulators, stack-based FP
Fallacies and Pitfalls
• Fallacy: You can design a flawless architecture– All designs have trade-offs
• VAX code size more important than easy decoding
• Early RISCs: delayed branches
• Address space
2.15. Concluding Remarks
• 1960’s: Stack architectures– Matched the compiler technology of the day
• 1970’s: CISC era– Tried to support HLL features in hardware
• Today: RISC era– Simple, load-store architectures
Concluding Remarks
• Trends in the 1990’s:– Move to 64 bits– Conditional instructions
• Eliminating branches
– Optimisation of cache access (prefetch instructions)
– Support for multimedia– Faster floating point
The Future
• Trend towards VLIW architectures
• Increased use of conditional execution
• Blending of general-purpose and DSP architectures
• Emulating 80x86 architecture