advcomp slides juhe
TRANSCRIPT
-
8/11/2019 Advcomp Slides Juhe
1/27
1 of 27
Advanced Compiler
Design andImplementation:
Run-Time SupportJuhana Helovuo
Data type representations and instruction set support
Register set and register usage
Activation records and run-time stack
Parameter passing modes
Code for subroutine calls
-
8/11/2019 Advcomp Slides Juhe
2/27
2 of 27
Shared object code
Dynamic typing, heap management, function polymorphism
-
8/11/2019 Advcomp Slides Juhe
3/27
3 of 27
Data type representations
Fixed-size integers: word, halfword, byte
How to treat integers with size < register size
Example: Add 5 to signed byte @ sp+72
Different sizes of loads, stores and arithmetic (M68k)
addi.b (72,a7), 5 ; add immediate byte
Sign/zero-extend on load instructions (Sparc)
ldsb [%sp+72],%l2 ; load signed byte (and extend)
add %l2, 5, %l2 ; add (32-bit)
stb %l2, [%sp+72] ; store byte
-
8/11/2019 Advcomp Slides Juhe
4/27
4 of 27
Sign/zero-extend and align with separate instructions
(Alpha)
ldq_u r2, 72(sp) ; load quadword unaligned -> r2
lda r1, 72(sp) ; load address -> r1
extbl r2, r1, r3 ; extract 1 byte from r2 -> r3
mskbl r2, r1, r2 ; mask (clear) byte from r2
addq r3, 5, r3 ; add quadword (64-bit)
insbl r3, r1, r3 ; shift byte back in positionor r2, r3, r2 ; combine result & rest of qword
stq_u r2, (r1) ; write quadword back to memory
The general case seems very complex, but case-specific
optimizations often simplify this (register allocation,alignment, BWX)
Very simple memory unit: Only aligned 64-bit loads & stores
For integer size > register size: Use two or four registers
Architecture may provide double load for two consecutive
registers (Sparc) or multiple load (ARM)
-
8/11/2019 Advcomp Slides Juhe
5/27
5 of 27
Long arithmetic
Use carry flag for addition & subtraction (Sparc)addcc %i1, %i3, %l0 ; add low words, generate carry
addx %i2, %i4, %l1 ; add high words + carry
Or use unsigned less than-comparison (Alpha)
addq a0, a2, t0 ; add low wordsaddq a1, a3, t1 ; add high words
cmpult t0, a0, t2 ; generate carry: t2 = (t0
-
8/11/2019 Advcomp Slides Juhe
6/27
6 of 27
Character strings
C-style strings: Array of characters, end of string marked bycharacter code 0
Pascal-style strings: Character count (integer) followed by an
array of characters
Instruction set support
x86: store string or move string instructions + repeat prefix,
byte-sized operations
Sparc: byte loads and stores
Alpha: insert, extract, mask, zap, cmpbge
PowerPC: load/store string (and compare)
-
8/11/2019 Advcomp Slides Juhe
7/27
7 of 27
Pointers
Usually 32/64-bit words (same as register size)
Naturally aligned: pointer mod sizeof(pointed data) = 0
Array access often requires pointer arithmetic
base pointer + (index * element size)
Element size is often 4 or 8
Special support for address computation
ARM: Data path for second operand contains a shifter unit
Alpha: s4add, s8add, s4sub, s8sub
PowerPC, ARM, Sparc: Indexed addressing mode
lwzx r0,r9,r2 ; r0 := M[r9+r2] (PowerPC)
ld [%i2+%i3], %l1 ; l1 := M[i2+i3] (Sparc)
-
8/11/2019 Advcomp Slides Juhe
8/27
8 of 27
Register Usage
Typical RISC has 32 integer registers
(ARM: 16, Itanium: 128, Sparc: register windows, x86: ~8)
Compiler typically has several uses for registers
stack pointer and frame pointer
global offset table pointer (global pointer)
dynamic link and static link
call arguments and return values
local variables
frequently used global variables
temporary values
-
8/11/2019 Advcomp Slides Juhe
9/27
9 of 27
...Register Usage
The compiler should maximize the use of the register set in
order to avoid memory accesses
The partitioning of the register set may be partially
determined by
ISA (Instruction Set Architecture = hardware platform) and
ABI (Application Binary Interface = system software)
ISA usually defines or recommends a stack pointer, possibly
also frame pointer and link register
ABI may define argument and return value registers
ABI must be followed to maintain interoperability with other
compilers and libraries
-
8/11/2019 Advcomp Slides Juhe
10/27
10 of 27
Register partitioning example (Alpha)
v0 = return value, a0..a5 = call arguments, ra=return address
s0..s5 = local/global variables, preserved across calls
t0..t11 = local variables and temporaries, not preserved
pv = call address, gp = global pointer, AT = assembler temp.
v0 t7s0
s1
fp
t8
t9
t10t11
pv
ATgp
sp
zero
s2
s3
s4
s5
a0a1
a2
a3
a4
a5
ra
t0
t1
t2
t3
t4
t5
t6
r0
r7 r31
r24r1
r2
...
...
-
8/11/2019 Advcomp Slides Juhe
11/27
11 of 27
The Run-Time Stack
The run-time stack is used to store activation records (stackframes)
Activation records represent
procedure activations and they may
contain
dynamic link and static link
call arguments and return values
local variables
saved registers (by caller and callee)
procedure call return address
The stack is maintained and accessed through the stack
pointer register, often also by the frame pointer
sp
fp
currentframe
previous
frame
sp+N
fp-M
-
8/11/2019 Advcomp Slides Juhe
12/27
12 of 27
The activation record is used to communicate between the
caller (main program) and callee (subroutine)
These procedures may be compiled separately
The compiler must adhere to a call convention, or a
procedure call protocol
Parts of the activation record are constructed by the caller
and some parts by the callee
Only the caller may know the size of argument list (C)
Only the callee knows the storage required for local
variables
Both have to be able to access arguments, return value and
links (dynamic, static, return address)
-
8/11/2019 Advcomp Slides Juhe
13/27
13 of 27
Links in Stack Frame
Dynamic Link
Used to find the calling stack frame on return
If the frame size is fixed and static, then there is no need for
this. Just use a constant offset in the codeStatic Link
Used to find the last activation of the static parent of the
current frame
Required only in languages allowing nested, local
procedures (e.g. Pascal, Ada, not in C)
Return Address
Used to find the code of the caller on procedure exit
RISCs store return address into a link registeron call (jump-
and-link) instruction
-
8/11/2019 Advcomp Slides Juhe
14/27
14 of 27
Parameter passing modes
Call by value: Argument value is copied into the callee. Theoriginal variable of the caller is not modified during the call.
Default in most languages (except Fortran and Perl)
Call by result: Argument is copied from the callee to thecaller. Used to return values.
Call by value-result: Argument is copied both ways.
Call by reference: Callee gets a reference (pointer) to amemory location holding the argument. Callee can modify
the argument.
Call by name: Like call by reference, but the argument pointer
expression is recomuputed at each access.
The callee is passed a small anonymous function to
compute the address of the argument.
-
8/11/2019 Advcomp Slides Juhe
15/27
15 of 27
Procedure Call and Return
Callers view of a subroutine call
Call
1. Evaluate each argument and place them in argumentregisters or stack frame
2. Determine the address of the subroutine (mostly done by thelinker)
3. Store caller-save -registers in stack frame
4. Compute a static link for the subroutine, if necessary
5. Save the return address and jump to the subroutineReturn
1. Restore saved registers from stack
2. Use the return value
-
8/11/2019 Advcomp Slides Juhe
16/27
16 of 27
Epilogue and Prologue
Callees view of the call
Prologue
1. Save frame pointer, copy stack pointer to frame pointer,compute new stack pointer, i.e. allocate new stack frame
2. Save callee-save registers, if necessary3. Construct a display (cache of static links), if necessary
Procedure body is executed between the prologue and the
epilogue
Epilogue
1. Restore saved callee-save registers
2. Restore SP from frame pointer and FP from dynamic link
3. Place return value in appropriate register or stack location
4. Jump to return address
-
8/11/2019 Advcomp Slides Juhe
17/27
17 of 27
Call Example
Sample C codeint test_proc(int a1, int a2)
{
int lv1, lv2;
...
return ...;
}
...
r = test_proc(r,4);
Subroutine with two intparameters and two intlocals
-
8/11/2019 Advcomp Slides Juhe
18/27
18 of 27
PowerPC calling convention (MacOS X)
stack framesare ofstatic and fixed size
no frame pointer
callee saves asmany registers as it
uses
frame contains
outgoingarguments(incoming
arguments in
previous frame)
callee may storeincoming args in
callers frame if it
needs them in memory
r0
r1
r2
r3
r10
r11
r12
r13
r31
link
count
cond
exception
zero/temp
stack ptr
temp
arg0/ret.v.
arg7
temp
indir. branch target
localvariables
Register partitioning Stack frame structure
old SPSP
saved cond
saved link
???
SP+24 arg0
argN
localvariables
savedregisters
prev. frameold SP
and temps
in memory
outgoing
args
arg1/ret.v.arg2...
-
8/11/2019 Advcomp Slides Juhe
19/27
19 of 27
PowerPC assembly for example call
Prologue and epilogue_test_proc:
mflr r0 ; r0
-
8/11/2019 Advcomp Slides Juhe
20/27
20 of 27
Procedure-valued variables
Rare in imperative languages, routine in functional languages
C provides function pointers
Simple to implement as plain code pointers
This is sufficient, since there are no local procedures
Nested procedures require prodecure values to contain both
code pointer and static link (=closure)
Static link is required to find the local variables of enclosing
scope
Now activation records may have to live even after the
function execution has ended. Stack allocation is notsufficient for all procedures
-
8/11/2019 Advcomp Slides Juhe
21/27
21 of 27
Position-Independent Code
Required for shared libraries - and more generally - for anydynamically loadable code, e.g. plugin modules
Only one copy of shared code in memory code cannot be
modified at load time
PIC must be loadable to an arbitrary memory location
Code and data references must work regardless of code
location
Local data references are SP-based ok
Jumps within the same object module can use relative
addressing ok
Global data references and jumps from object module to
another cannot be absolute use indirect addressing
-
8/11/2019 Advcomp Slides Juhe
22/27
22 of 27
Global Offset Table
Global Offset Table (GOT) is a pointer table used to point toglobal symbols, whose addresses are not known until
program load time.
Data References
The compiler generates indirect references though the GOT
The link-editor relocates the reference as a GOT offset
The run-time linker fills the GOT with actual symbol
addresses, when it knows where the object will be loaded
Code References
Calls to shared code jump to an element of Procedure
Linkage Table (PLT)
PLT element contains code to load an address from GOT and
a jump to that address
-
8/11/2019 Advcomp Slides Juhe
23/27
23 of 27
GOT Example (Sparc)
Procedure prologue.LLGETPC0: ; helper function
retl ; to read program counter
add %o7, %l7, %l7 ; %l7 += return address
so_func: ; actual procedure start
save %sp, -112, %sp ; allocate stack frame
sethi %hi(_GLOBAL_OFFSET_TABLE_-4), %l7
call .LLGETPC0
add %l7, %lo(_GLOBAL_OFFSET_TABLE_+4), %l7
; now %l7 contains the address of GOT
; _GLOBAL_OFFSET_TABLE_ is a PC-relative symbol
Loading from global data symbol si; symbol si has relocation type GOT, i.e. it is treated as
; an offset into GOT, not actual memory address
sethi %hi(si), %g1
or %g1, %lo(si), %g1 ; %g1 = GOT offset for sild [%l7+%g1], %g1 ; load address of si from GOT
ld [%g1], %i0 ; load value of si
-
8/11/2019 Advcomp Slides Juhe
24/27
24 of 27
Calling via PLT call so_aux, 0 ; looks normal, but symbol so_aux
; has relocation type PLT
; linker relocates this to .PLT2
The call is to object module-local PLT, not actual subroutine
in another object
.PLT2
sethi (. - .PLT0), %g1
sethi %hi(so_aux), %g1
jmp %g1+%lo(so_aux)
.PLT3
sethi (. - .PLT0), %g1
ba,a .PLT0
nop
.PLT0
save %sp, -64, %sp
call dyn_linker
so_aux:
save ......
PLT
Code from shared object
0: first entry
2: run-time
3: not yet
linked entry
linked
-
8/11/2019 Advcomp Slides Juhe
25/27
25 of 27
Dynamic typing and polymorphism
Dynamic typing: The programming language does notassociate types to variables, but rather to data values
Variable name can refer to value of any type
Dynamic typing is usually implemented by taggingdatavalues. Each value carries a type tag with it.
The compiler should generate efficient code for resolving the
types of data values and selecting the corresponding
(polymorphic) operation on them, e.g. a+b on integers, floats,stings or bignums.
Modern architectures have very little built-in hardware
support for this
Sparc provides tagged addand subtractinstructions
-
8/11/2019 Advcomp Slides Juhe
26/27
26 of 27
Storage management
Fully manual: mallocand freein C, ornewand deletein C++
Automatic deallocation: newbut no delete in Java
Fully automatic: All memory management operationsimplicit, in e.g. Lisp or Haskell
Automatic deallocation is usually based on reference
counting, garbage collection, or combination of both
Manual allocation is usually implemented as a library call
e.g. Doug Leas dlmalloc library has been shown to
outperform custom memory allocation routines
-
8/11/2019 Advcomp Slides Juhe
27/27
27 of 27
Summary
Language semantics and run-time services (dynamicloading, code sharing, memory management) may require
complicated run-time support
It should be possible to optimize away costly parts of the
procedure call mechanism to obtain good call performance.
The amount of required run-time support code depends on
the language and hardware.
Modern RISCs do not have much explicit architecturalsupport for specific high-level languages, but this can be
compensated in software