64-bit insider volume 1 issue 13

8/6/2019 64-Bit Insider Volume 1 Issue 13

1/7

64-bit Insider Newsletter

Volume 1, Issue 13

Page 1/7

64-bit InsiderVolume I, Issue 13, June 2006, Microsoft

Optimization on Windows

64-bit: Part 1 of 3

Your computer applications can have increased

resources and scalability, thanks to 64-bit processors

and operating systems. However, it is still important to

understand and follow the basic principals of software

optimization. This issue of the 64-bitInsidernewsletter

is the first of a three-part series that focuses on differentaspects of optimization. In this issue, we discuss many

of the principals and tools for software optimization as

they relate to 64-bit processors. In upcoming issues, we

will examine software optimization for multi-core and

multiprocessor systems and also for specific 64-bit

processors.

The 64-bit Advantage

The computer industry is

changing, and 64-bit technology

is the next, inevitable step. The

64-bit Insider newsletter will help

you adopt this technology by

providing tips and tricks for a

successful port.

Development and migration of 64-

bit technology is not as

complicated as the 16-bit to 32-bittransition. However, as with any

new technology, several areas do

require close examination andconsideration. The goal of the 64-

bit Insidernewsletteris to identify

potential migration issues and

provide viable, effective solutions

to these issues. With a plethora of

Web sites already focused on 64-

bit technology, the intention of

this newsletter is not to repeat

previously published information.

Instead, it will focus on 64-bit

issues that are somewhat isolated

yet extremely important to

understand. It will also connect

you to reports and findings from

64-bit experts.


2/7


Volume 1, Issue 13

Page 2/7

What is Optimization?

First, we should clarify what we mean by

optimization. Optimization means to improve the

efficiency of your application. That can mean

building an application that minimizes the use ofsome set of computer resources. For example, one

that uses as little RAM as possible, or runs as fast

as possible. In some cases, optimization may also

relate to network bandwidth or hard disk space.

Software optimization techniques can require changes to any level of your application

from the high-level architecture of a multi-component system, to the algorithms you use

to implement small, functional units, down to the specific machine-code instructions

you use to execute simple statements. Although it can be a mistake to focus too much

on optimization when functionality is still immature, optimization should never be too

far from your mind. Commercial software and custom solutions routinely includeperformance benchmarks in their requirements specifications.

Optimizing your Application

There are three levels at which you can improve the efficiency of your 32-bit or 64-bit

applications.

1. Enhance hardware (add memory/add processors)2. Make code modifications3. Use compiler options.

Enhancing Hardware

Adding processors is the fastest way to gain short-term performance increases. But this

option only works if the processor is the cause of the bottleneck in your application

this source of the problem is not always the case. However, assuming that your

algorithm is optimal and all the other alternatives discussed in this newsletter have been

applied, adding processors is a valid way to increase the performance level of your

application.

A basic axiom of system performance is that RandomAccess Memory (RAM) storage is more expensive, and

faster, than disk storage. A common technique used to

improve performance is to eliminate or reduce disk

access by expanding available memory and keeping

everything stored in RAM. 64-bit Windows makes this

technique feasible by greatly increasing the amount of

RAM that is available to an application. This technique

is a cheap but effective way to speed certain

applications. For example, databases and Web servers

can make significant performance gains by moving to

64-bit systems that have large amounts of memory.

Software optimization

techniques can requirechanges to any level of

your applicationfrom the

high-level architecture of a

multi-component system, to

the algorithms you use to

implement small functional

units, down to the specific

machine-code instructions

you use to execute simple

statements.


3/7


Volume 1, Issue 13

Page 3/7

Modifying Your Code

Making code modifications does not necessitate

using a new algorithm or changing the design of

the application. This newsletter assumes that

you are already using an optimal design that

works best in your chosen scenarios. Makingcode modifications can also mean to optimize

the applications use of memory or to use

compiler directives that help the compiler create

code that works better for specific processors. This series of newsletters will provide

several directives that help in this way.

Using Compiler Options

For example, compilers will not always use Single Instruction, Multiple Data (SIMD) in

certain algorithms. This limitation may be due to the complexity of the loops or the

inability of the compiler to guarantee the independence between iterations of the loopthat is required to ensure correct behavior. A suitable compiler directive or restructuring

of the loop may be required to let the compiler know that SIMD will work fine.

In addition to compiler directives or switches, command-line options passed to the

compiler and linker enable you to identify conditions where the compiler can make

further optimizationsconditions that the compiler might not be able to make on its

own. For example, the /fp:fast option tells the compiler to use faster but less-precise

floating point instructions.

Compiler Switches

Table 1 identifies some of the compiler switches related to optimization that areavailable in the C++ compilers from Microsoft and Intel.

Table 1 Compiler Switches

Microsoft C++ Intel C/C++ Description

/O1 -O1 Creates the smallest possible code.

/O2 -O2 Creates faster code, possibly increasing size.

/O3 -O3 Creates even faster code (sometimes).

/Oa -fno-alias Assumes no aliasing in the application, and

enables some register and loop optimizations.

/Ow -fno-fnalias Assumes no aliasing between functions, but

aliasing within functions.

/Ob -Ob Controls inline expansion. Inlining functions

reduce function call overhead.

/Og - Combines several types of optimizations.

/Oi /fbuiltin Enables inlining intrinsic functions to replace

some common functions

/Os -Os Optimizes for speed, but favors small code.

/Ot - Favors speed over size.

/Ox - Provides maximum optimization.

/arch -mtune,-mcpu,-

Qx

Uses Simple Sharing Extensions (SSE) or SSE2

instructions.


4/7


Volume 1, Issue 13

Page 4/7

/G5 -mtune,-mcpu,-

Qx

Favors the Pentium processor.

/G6 -mtune,-mcpu,-

Qx

Favors the Pentium Pro, II, III, and Pentium 4

processors.

/G7 -mtune,-mcpu,-

Qx

Favors Pentium 4 and AMD Athlon processors.

/fp:fast -fp-modal Enables more aggressive optimizations on

floating-point data, possibly sacrificing precision.

/GL -Qipo Yields whole program optimization / Inter-

procedural optimization.

/PGI,/PGO /prof-gen Gives profile-guided optimization.

/favor - Optimizes for either the AMD or Intel 64-bit

processors or for both.

Note Please review the documentation for both compilers to learn the specific details

and associated caveats for each switch. The Intel compiler has additional options for

high-level and floating-point optimization. Please refer to the Intel documentation for

more information about these options.

Understanding Link-Time Code Generation and Profile GuidedOptimization

Confusion frequently surrounds the terminology and differences between Whole

Program Optimization (or Link-Time Code Generation [LTCG]) and Profile-Guided

Optimization (PGO). This section clarifies the differences.

In the Microsoft Visual Studio documentation, Whole Program Optimization is also

called LTCG. Confusion usually stems from the fact that PGO is also a type of WholeProgram Optimization and uses LTCG to get its work done. As a first step toward

clarifying both of these, this newsletter will only use the term LTCGnot the less

accurate term, Whole Program Optimization.

Both LTCG and PGO are techniques that allow you to optimize your application

without making changes to your code.

Link-Time Code Generation

LTCG is simply a mechanism by which the generation of machine code (and

accompanying optimization) is being delayed until link time. So strictly speaking, it is

not a form of optimizationLTCG just enables additional optimizations. The classic

compile/link process compiles and optimizes files individually and then links them

together. LTCG enables additional optimizations

because it postpones the optimization steps until link

time when all the object files are available, and they can

all be optimized together.

As shown in Figure 1, there are two differences between

the procedures for compiling and linking when

performed with and without LTCG. First, when using

LTCG, the optimization step is moved to link time.Second, the object files generated by the compiler are

LTCG is simply amechanism by which the

generation of machine

code (and accompanying

optimization) is being

delayed until link time.


5/7


Volume 1, Issue 13

Page 5/7

not in the standard Common Object File Format (COFF). They are in a proprietary

format that can change between versions of the compiler. As a result, programs like

DUMPBIN and EDITBIN do not work with these object files.

The proprietary format allows the optimizer to consider all object files during

optimization. This consideration enables more effective inlining, memorydisambiguation, and other inter-procedural optimizations. Also, the executable can be

better arranged to reduce offsets for things like Thread Local Storage and to reduce

paging in large executables. Refer to the compiler documentation for a full description

of the optimizations that are enabled by LTCG.

Figure 1 Comparison of compiling procedures with and without LTCG

Profile-Guided OptimizationMany optimization techniques are heuristic in nature and many involve trade-offsbetween image size and speed. For example, choosing whether to inline a function

depends on how large the function is and how often it is called. Small functions that are

called many times should be inlined. Large functions that are called only once, from a

few locations, should not be inlined. At least, this practice is usually a safe bet. But you

must also consider situations that fall between these two extremes.

Generic algorithms can be used to determine whether or not to inline functions. And,

frequently, it is not clear how often a function is called. For example, the function may

be guarded by a condition. Similar problems exist for branch prediction (which

determines the order of switch and other conditional statements).

EXE

C++

OBJ

(COFF)

Linker

C++

OBJ

(COFF)

EXE

C++

OBJ

(CIL)

Compiler(/GL)

C++

OBJ

(CIL)

Compiler(/GL)

Optimizer Optimizer

Optimizer

Compiler Compiler

Linker

(/LTCG)

Without LTCG With LTCG


6/7


Volume 1, Issue 13

Page 6/7

PGO is a technique whereby your program can be

executed with a representative set of data. Your

programs behavior will be monitored to determine how

often pieces of code have been executed and how often

certain branches have been taken. This techniqueproduces aprofile that the optimizer can use to make

better decisions during optimization.

So, PGO provides additional information about your

applications behavior that is gathered during a runtime analysis of the application. PGO

provides information that is not available during the static analysis of your application,

alone. LCTG must still be used; however, PGO gives the optimization process even

more information so that better decisions can be made.

PGO requires a three-step approach, which is also highlighted in Figure 2:

1. Compile and link your application to produce an instrumented program versionthat gathers information on your applications behavior at runtime. This step

requires a /GL switch for the compiler and a /LTCG:PGI switch for the linker.

2. Execute your application and feed it data or user input that represents data thatwould be expected from a target user. It is important to choose this data

carefully; otherwise, you will be optimizing your program for irrelevant

scenarios.

3. Re-link your application with the /LTCG:PGO switch to re-optimize yourapplication by using the new information generated in Step 2.

Figure 2 Profile-Guided Optimization three-step approach

Again, please consult the compiler documentation for a full description of the kinds of

optimizations that can be performed during PGO.

PGO provides additionalinformation about your

applications behavior thatis gathered during a

runtime analysis of theapplication.

C++OBJ

(CIL)Compiler

(/GL)OptimizerLinker

(/LTCG:PGI)

OptimizerInstrumented

EXE

Sample

data

Profile

Linker

(/LTCG:PGO)

Optimized

EXE


7/7


Volume 1, Issue 13

Page 7/7

Summary

Optimization means to maximize your applications performance by reducing its use of

expensive or slow resources, and doing so without reducing the applications ability to

do work. Although spending money on extra hardware can help, appropriate changes to

how your application is written or built can yield substantial performance gains, as well.

Assuming that your algorithms are sound, compiler flags should be the first place you

look to increase your applications performance. Both LTCG and PGO can be enabled

by using compiler flags and can substantially improve performance without changing asingle line of code.

64-bit insider volume 1 issue 13

Documents