64-bit insider volume 1 issue 13
TRANSCRIPT
-
8/6/2019 64-Bit Insider Volume 1 Issue 13
1/7
64-bit Insider Newsletter
Volume 1, Issue 13
Page 1/7
64-bit InsiderVolume I, Issue 13, June 2006, Microsoft
Optimization on Windows
64-bit: Part 1 of 3
Your computer applications can have increased
resources and scalability, thanks to 64-bit processors
and operating systems. However, it is still important to
understand and follow the basic principals of software
optimization. This issue of the 64-bitInsidernewsletter
is the first of a three-part series that focuses on differentaspects of optimization. In this issue, we discuss many
of the principals and tools for software optimization as
they relate to 64-bit processors. In upcoming issues, we
will examine software optimization for multi-core and
multiprocessor systems and also for specific 64-bit
processors.
The 64-bit Advantage
The computer industry is
changing, and 64-bit technology
is the next, inevitable step. The
64-bit Insider newsletter will help
you adopt this technology by
providing tips and tricks for a
successful port.
Development and migration of 64-
bit technology is not as
complicated as the 16-bit to 32-bittransition. However, as with any
new technology, several areas do
require close examination andconsideration. The goal of the 64-
bit Insidernewsletteris to identify
potential migration issues and
provide viable, effective solutions
to these issues. With a plethora of
Web sites already focused on 64-
bit technology, the intention of
this newsletter is not to repeat
previously published information.
Instead, it will focus on 64-bit
issues that are somewhat isolated
yet extremely important to
understand. It will also connect
you to reports and findings from
64-bit experts.
-
8/6/2019 64-Bit Insider Volume 1 Issue 13
2/7
64-bit Insider Newsletter
Volume 1, Issue 13
Page 2/7
What is Optimization?
First, we should clarify what we mean by
optimization. Optimization means to improve the
efficiency of your application. That can mean
building an application that minimizes the use ofsome set of computer resources. For example, one
that uses as little RAM as possible, or runs as fast
as possible. In some cases, optimization may also
relate to network bandwidth or hard disk space.
Software optimization techniques can require changes to any level of your application
from the high-level architecture of a multi-component system, to the algorithms you use
to implement small, functional units, down to the specific machine-code instructions
you use to execute simple statements. Although it can be a mistake to focus too much
on optimization when functionality is still immature, optimization should never be too
far from your mind. Commercial software and custom solutions routinely includeperformance benchmarks in their requirements specifications.
Optimizing your Application
There are three levels at which you can improve the efficiency of your 32-bit or 64-bit
applications.
1. Enhance hardware (add memory/add processors)2. Make code modifications3. Use compiler options.
Enhancing Hardware
Adding processors is the fastest way to gain short-term performance increases. But this
option only works if the processor is the cause of the bottleneck in your application
this source of the problem is not always the case. However, assuming that your
algorithm is optimal and all the other alternatives discussed in this newsletter have been
applied, adding processors is a valid way to increase the performance level of your
application.
A basic axiom of system performance is that RandomAccess Memory (RAM) storage is more expensive, and
faster, than disk storage. A common technique used to
improve performance is to eliminate or reduce disk
access by expanding available memory and keeping
everything stored in RAM. 64-bit Windows makes this
technique feasible by greatly increasing the amount of
RAM that is available to an application. This technique
is a cheap but effective way to speed certain
applications. For example, databases and Web servers
can make significant performance gains by moving to
64-bit systems that have large amounts of memory.
Software optimization
techniques can requirechanges to any level of
your applicationfrom the
high-level architecture of a
multi-component system, to
the algorithms you use to
implement small functional
units, down to the specific
machine-code instructions
you use to execute simple
statements.
-
8/6/2019 64-Bit Insider Volume 1 Issue 13
3/7
64-bit Insider Newsletter
Volume 1, Issue 13
Page 3/7
Modifying Your Code
Making code modifications does not necessitate
using a new algorithm or changing the design of
the application. This newsletter assumes that
you are already using an optimal design that
works best in your chosen scenarios. Makingcode modifications can also mean to optimize
the applications use of memory or to use
compiler directives that help the compiler create
code that works better for specific processors. This series of newsletters will provide
several directives that help in this way.
Using Compiler Options
For example, compilers will not always use Single Instruction, Multiple Data (SIMD) in
certain algorithms. This limitation may be due to the complexity of the loops or the
inability of the compiler to guarantee the independence between iterations of the loopthat is required to ensure correct behavior. A suitable compiler directive or restructuring
of the loop may be required to let the compiler know that SIMD will work fine.
In addition to compiler directives or switches, command-line options passed to the
compiler and linker enable you to identify conditions where the compiler can make
further optimizationsconditions that the compiler might not be able to make on its
own. For example, the /fp:fast option tells the compiler to use faster but less-precise
floating point instructions.
Compiler Switches
Table 1 identifies some of the compiler switches related to optimization that areavailable in the C++ compilers from Microsoft and Intel.
Table 1 Compiler Switches
Microsoft C++ Intel C/C++ Description
/O1 -O1 Creates the smallest possible code.
/O2 -O2 Creates faster code, possibly increasing size.
/O3 -O3 Creates even faster code (sometimes).
/Oa -fno-alias Assumes no aliasing in the application, and
enables some register and loop optimizations.
/Ow -fno-fnalias Assumes no aliasing between functions, but
aliasing within functions.
/Ob -Ob Controls inline expansion. Inlining functions
reduce function call overhead.
/Og - Combines several types of optimizations.
/Oi /fbuiltin Enables inlining intrinsic functions to replace
some common functions
/Os -Os Optimizes for speed, but favors small code.
/Ot - Favors speed over size.
/Ox - Provides maximum optimization.
/arch -mtune,-mcpu,-
Qx
Uses Simple Sharing Extensions (SSE) or SSE2
instructions.
-
8/6/2019 64-Bit Insider Volume 1 Issue 13
4/7
64-bit Insider Newsletter
Volume 1, Issue 13
Page 4/7
/G5 -mtune,-mcpu,-
Qx
Favors the Pentium processor.
/G6 -mtune,-mcpu,-
Qx
Favors the Pentium Pro, II, III, and Pentium 4
processors.
/G7 -mtune,-mcpu,-
Qx
Favors Pentium 4 and AMD Athlon processors.
/fp:fast -fp-modal Enables more aggressive optimizations on
floating-point data, possibly sacrificing precision.
/GL -Qipo Yields whole program optimization / Inter-
procedural optimization.
/PGI,/PGO /prof-gen Gives profile-guided optimization.
/favor - Optimizes for either the AMD or Intel 64-bit
processors or for both.
Note Please review the documentation for both compilers to learn the specific details
and associated caveats for each switch. The Intel compiler has additional options for
high-level and floating-point optimization. Please refer to the Intel documentation for
more information about these options.
Understanding Link-Time Code Generation and Profile GuidedOptimization
Confusion frequently surrounds the terminology and differences between Whole
Program Optimization (or Link-Time Code Generation [LTCG]) and Profile-Guided
Optimization (PGO). This section clarifies the differences.
In the Microsoft Visual Studio documentation, Whole Program Optimization is also
called LTCG. Confusion usually stems from the fact that PGO is also a type of WholeProgram Optimization and uses LTCG to get its work done. As a first step toward
clarifying both of these, this newsletter will only use the term LTCGnot the less
accurate term, Whole Program Optimization.
Both LTCG and PGO are techniques that allow you to optimize your application
without making changes to your code.
Link-Time Code Generation
LTCG is simply a mechanism by which the generation of machine code (and
accompanying optimization) is being delayed until link time. So strictly speaking, it is
not a form of optimizationLTCG just enables additional optimizations. The classic
compile/link process compiles and optimizes files individually and then links them
together. LTCG enables additional optimizations
because it postpones the optimization steps until link
time when all the object files are available, and they can
all be optimized together.
As shown in Figure 1, there are two differences between
the procedures for compiling and linking when
performed with and without LTCG. First, when using
LTCG, the optimization step is moved to link time.Second, the object files generated by the compiler are
LTCG is simply amechanism by which the
generation of machine
code (and accompanying
optimization) is being
delayed until link time.
-
8/6/2019 64-Bit Insider Volume 1 Issue 13
5/7
64-bit Insider Newsletter
Volume 1, Issue 13
Page 5/7
not in the standard Common Object File Format (COFF). They are in a proprietary
format that can change between versions of the compiler. As a result, programs like
DUMPBIN and EDITBIN do not work with these object files.
The proprietary format allows the optimizer to consider all object files during
optimization. This consideration enables more effective inlining, memorydisambiguation, and other inter-procedural optimizations. Also, the executable can be
better arranged to reduce offsets for things like Thread Local Storage and to reduce
paging in large executables. Refer to the compiler documentation for a full description
of the optimizations that are enabled by LTCG.
Figure 1 Comparison of compiling procedures with and without LTCG
Profile-Guided OptimizationMany optimization techniques are heuristic in nature and many involve trade-offsbetween image size and speed. For example, choosing whether to inline a function
depends on how large the function is and how often it is called. Small functions that are
called many times should be inlined. Large functions that are called only once, from a
few locations, should not be inlined. At least, this practice is usually a safe bet. But you
must also consider situations that fall between these two extremes.
Generic algorithms can be used to determine whether or not to inline functions. And,
frequently, it is not clear how often a function is called. For example, the function may
be guarded by a condition. Similar problems exist for branch prediction (which
determines the order of switch and other conditional statements).
EXE
C++
OBJ
(COFF)
Linker
C++
OBJ
(COFF)
EXE
C++
OBJ
(CIL)
Compiler(/GL)
C++
OBJ
(CIL)
Compiler(/GL)
Optimizer Optimizer
Optimizer
Compiler Compiler
Linker
(/LTCG)
Without LTCG With LTCG
-
8/6/2019 64-Bit Insider Volume 1 Issue 13
6/7
64-bit Insider Newsletter
Volume 1, Issue 13
Page 6/7
PGO is a technique whereby your program can be
executed with a representative set of data. Your
programs behavior will be monitored to determine how
often pieces of code have been executed and how often
certain branches have been taken. This techniqueproduces aprofile that the optimizer can use to make
better decisions during optimization.
So, PGO provides additional information about your
applications behavior that is gathered during a runtime analysis of the application. PGO
provides information that is not available during the static analysis of your application,
alone. LCTG must still be used; however, PGO gives the optimization process even
more information so that better decisions can be made.
PGO requires a three-step approach, which is also highlighted in Figure 2:
1. Compile and link your application to produce an instrumented program versionthat gathers information on your applications behavior at runtime. This step
requires a /GL switch for the compiler and a /LTCG:PGI switch for the linker.
2. Execute your application and feed it data or user input that represents data thatwould be expected from a target user. It is important to choose this data
carefully; otherwise, you will be optimizing your program for irrelevant
scenarios.
3. Re-link your application with the /LTCG:PGO switch to re-optimize yourapplication by using the new information generated in Step 2.
Figure 2 Profile-Guided Optimization three-step approach
Again, please consult the compiler documentation for a full description of the kinds of
optimizations that can be performed during PGO.
PGO provides additionalinformation about your
applications behavior thatis gathered during a
runtime analysis of theapplication.
C++OBJ
(CIL)Compiler
(/GL)OptimizerLinker
(/LTCG:PGI)
OptimizerInstrumented
EXE
Sample
data
Profile
Linker
(/LTCG:PGO)
Optimized
EXE
-
8/6/2019 64-Bit Insider Volume 1 Issue 13
7/7
64-bit Insider Newsletter
Volume 1, Issue 13
Page 7/7
Summary
Optimization means to maximize your applications performance by reducing its use of
expensive or slow resources, and doing so without reducing the applications ability to
do work. Although spending money on extra hardware can help, appropriate changes to
how your application is written or built can yield substantial performance gains, as well.
Assuming that your algorithms are sound, compiler flags should be the first place you
look to increase your applications performance. Both LTCG and PGO can be enabled
by using compiler flags and can substantially improve performance without changing asingle line of code.