a intro

Section AIntroduction, Overview and IR

Historical Perspectives

Stanford RISC compiler research

MIPS UcodeCompiler (R2000)

1980-83

1987

Global optunder -O2

MIPS UcodeCompiler (R4000)

1991

Loop opt under -O3

Cydrome Cydra5Compiler

1989

Softwarepipelining

SGI RagnarokCompiler (R8000)

1994

Floating-ptperformance

SGI MIPSproCompiler (R10000)

1997 Pro64/Open64Compiler (Itanium)

2000

Stanford SUIF

Rice IPA

Open64 Key Events

1994: Development started to compile for MIPS R10000

1996: SGI MIPSpro compiler shipped

1998: Started retargetting to Itanium, changed front-ends to gcc/g++

2000: Pro64 compiler open-sourced via GPL

2001: SGI dropped support, UDel renamed compiler to Open64

2001 – 2004: ORC project funded by Intel for Itanium

2003: PathScale started work to retarget to X86

2004: PathScale X86 compiler shipped in April

Important Attributes

Most modern design among today's production compilers

Infrastructure designed to facilitate optimization implementation

In production since 1996, open-sourced in 2000

Compatible and inter-operable with gcc/g++

Source constructs, command-line, ABI, linking

Easy to extend/enhance

Easy to retarget to new processors

Widely used today in optimization research

Cater to small team development environments

PathScale Compiler Key Contributions

Retargetted to X86/X86-64

Performance leader for 64-bit X86 Linux since 2004

Bridge between GNU and Open64 code base to ease tracking GNU releases

Proprietary OpenMP runtime library

QA and Build infrastructure

Open64 Overall Design Compiler Infrastructure that can support a full spectrum of optimization capabilities

Major ingredient: a common Intermediate Representation WHIRL

Single back-end for multiple front-ends

One IR, multiple levels of representation

Compilation process continuously lowers representation

Specialized components based on optimization disciplines:

LNO: loop-oriented, based on data dependency

WOPT: global scalar optimization based on SSA

IPA: inter-procedural, requiring whole-program analysis

CG: target-dependent

No duplicate efforts among the phases

WHIRL simplifier callable from all phases

Share analysis results

Call each other to get work done

Role of IR in a Compiler

Support multiple front-ends (languages)

Support multiple processor targets

Medium for performing optimizing transformations

Common interface among phases

Promote modularity in compiler design

Reduce duplicate functionalities

Key infra-structure of a modern compiler

Semantic Level of IR

Source program

Machineinstruction

High

Low

At higher level:

•More kinds of constructs

•Shorter code sequence

•More program info present

•Hierarchical constructs

•Cannot perform many optimizations

At lower level:

•Less program info

•Fewer kinds of constructs

•Longer code sequence

•Flat constructs

•All optimizations can be performed

Open64’s WHIRL IR

WHIRL developed by the Open64 team at SGI:

One IR, multiple levels of representation

Compilation process continuously lowers representation

Each optimization has best level to perform at

Share analysis results

No duplicate efforts among phases

Compilation Flow

Optimization Design Philosophy

Perform optimization at the highest level where the transformation is expressible

• More program info to help analysis

• Shorter code sequence to manipulate

• Less variations for same computation

Results:

• Less implementation efforts

• Faster, more efficient optimization

• Greater robustness (easier to test, quicker stabilization

Phase Ordering Design Principles

Dictated by lowering process

− Optimizations exposed at lower level must be done late

Early phases transform to expose optimization opportunities

− Inlining

− Constant propagation

− Loop fusion

Early phases compute information for the benefits of later phases

− Aliasing and pointer disambiguation

− Use-def

− Data dependency

Early phases canonicalize the code

− Less code variation for later phases

− May not speed up the program

− Depends on clean-up by later phases

Cheap optimizations applied multiple times

Optimizations that destroy program information applied as late as possible

Design of WHIRLWHIRL tree nodes for executable code WHIRL node defined in common/com/wn_core.h Each function body represented by one big tree

WHIRL symbol table for declarations Different tables for different declaration constructs See common/com/symtab*.h

Smallest WHIRL node is 24 bytes

Packed fields to save space

Binary reader and writer WHIRL file in ELF file format

Unique WHIRL file suffix according to phase: Front-end: .B IPA/inliner: .I LNO: .N WOPT: .O

ASCII dumper

Important WHIRL Concepts

operator: name of operation

desc: machine (scalar) type of operands

rtype: machine (scalar) type of result

opcode: the triplet (operator, rtype, desc)

symbol table index: uniquely identify a program symbol

high level type: type construct as declared in program

Important for implementing ANSI alias rule

field-id: uniquely identify a field in a struct or union

new128-bit SIMD machine types defined for X86’s MMX/SSE

WHIRL Concepts for Optimization

Statement nodes are sequence points Side effects only possible at statement boundaries Statements with side effects limited to:

Stores

Calls

asms

Expression nodes are not sequence points

Expression nodes perform computation with no side effects Allows aggressive expression transformations

Source position information (to support debugging) only applies to statement nodes

WHIRL Maps

For annotating WHIRL nodes with additional info

Overcome fixed size of WHIRL nodes

Good for transient information

WHIRL nodes classified into categories

Map_id stored in each WHIRL node

map_id unique only within a category

Information stored in map tables with map_id as index

Very High WHIRL

Preserve abstraction present in the source language

Can be translated back to C/F90 with minor loss of semantics

Originally defined for FORTRAN 90, later extended usage to C/C++

Constructs allowed only in VH WHIRL:

Comma operator

Nested function calls

C select operator (? and : )

For F90: triplet, arrayexp, arrsection, where

Inliner can work at this level

High WHIRL

Constructs that support loop-level optimizations

Fixed (though not explicit) control flow

Key constructs:

ARRAY (data dependency analysis, vectorization)

DO loops

IF statements

FORTRAN I/O statements

IPA, PREOPT and LNO work at this level

Can be translated back to source language

Allow user to view effects of inlining and LNO

Mid WHIRL

One-to-one mapping to RISC instructions

Control flow explicit via jumps

Address computation exposed

Bitfield accesses exposed

Complex numbers expanded to operations on float

WOPT work at this level

Uniform representation enhances optimization opportunites

Low WHIRL

Final form of WHIRL to facilitate translation to machine instructions in CG

Some intrinsics replaced by calls

Linkage convention exposed

Data layout completed

a intro

Technology