64-bit insider volume 1 issue 13

Upload: nayeemkhan

Post on 08-Apr-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 64-Bit Insider Volume 1 Issue 13

    1/7

    64-bit Insider Newsletter

    Volume 1, Issue 13

    Page 1/7

    64-bit InsiderVolume I, Issue 13, June 2006, Microsoft

    Optimization on Windows

    64-bit: Part 1 of 3

    Your computer applications can have increased

    resources and scalability, thanks to 64-bit processors

    and operating systems. However, it is still important to

    understand and follow the basic principals of software

    optimization. This issue of the 64-bitInsidernewsletter

    is the first of a three-part series that focuses on differentaspects of optimization. In this issue, we discuss many

    of the principals and tools for software optimization as

    they relate to 64-bit processors. In upcoming issues, we

    will examine software optimization for multi-core and

    multiprocessor systems and also for specific 64-bit

    processors.

    The 64-bit Advantage

    The computer industry is

    changing, and 64-bit technology

    is the next, inevitable step. The

    64-bit Insider newsletter will help

    you adopt this technology by

    providing tips and tricks for a

    successful port.

    Development and migration of 64-

    bit technology is not as

    complicated as the 16-bit to 32-bittransition. However, as with any

    new technology, several areas do

    require close examination andconsideration. The goal of the 64-

    bit Insidernewsletteris to identify

    potential migration issues and

    provide viable, effective solutions

    to these issues. With a plethora of

    Web sites already focused on 64-

    bit technology, the intention of

    this newsletter is not to repeat

    previously published information.

    Instead, it will focus on 64-bit

    issues that are somewhat isolated

    yet extremely important to

    understand. It will also connect

    you to reports and findings from

    64-bit experts.

  • 8/6/2019 64-Bit Insider Volume 1 Issue 13

    2/7

    64-bit Insider Newsletter

    Volume 1, Issue 13

    Page 2/7

    What is Optimization?

    First, we should clarify what we mean by

    optimization. Optimization means to improve the

    efficiency of your application. That can mean

    building an application that minimizes the use ofsome set of computer resources. For example, one

    that uses as little RAM as possible, or runs as fast

    as possible. In some cases, optimization may also

    relate to network bandwidth or hard disk space.

    Software optimization techniques can require changes to any level of your application

    from the high-level architecture of a multi-component system, to the algorithms you use

    to implement small, functional units, down to the specific machine-code instructions

    you use to execute simple statements. Although it can be a mistake to focus too much

    on optimization when functionality is still immature, optimization should never be too

    far from your mind. Commercial software and custom solutions routinely includeperformance benchmarks in their requirements specifications.

    Optimizing your Application

    There are three levels at which you can improve the efficiency of your 32-bit or 64-bit

    applications.

    1. Enhance hardware (add memory/add processors)2. Make code modifications3. Use compiler options.

    Enhancing Hardware

    Adding processors is the fastest way to gain short-term performance increases. But this

    option only works if the processor is the cause of the bottleneck in your application

    this source of the problem is not always the case. However, assuming that your

    algorithm is optimal and all the other alternatives discussed in this newsletter have been

    applied, adding processors is a valid way to increase the performance level of your

    application.

    A basic axiom of system performance is that RandomAccess Memory (RAM) storage is more expensive, and

    faster, than disk storage. A common technique used to

    improve performance is to eliminate or reduce disk

    access by expanding available memory and keeping

    everything stored in RAM. 64-bit Windows makes this

    technique feasible by greatly increasing the amount of

    RAM that is available to an application. This technique

    is a cheap but effective way to speed certain

    applications. For example, databases and Web servers

    can make significant performance gains by moving to

    64-bit systems that have large amounts of memory.

    Software optimization

    techniques can requirechanges to any level of

    your applicationfrom the

    high-level architecture of a

    multi-component system, to

    the algorithms you use to

    implement small functional

    units, down to the specific

    machine-code instructions

    you use to execute simple

    statements.

  • 8/6/2019 64-Bit Insider Volume 1 Issue 13

    3/7

    64-bit Insider Newsletter

    Volume 1, Issue 13

    Page 3/7

    Modifying Your Code

    Making code modifications does not necessitate

    using a new algorithm or changing the design of

    the application. This newsletter assumes that

    you are already using an optimal design that

    works best in your chosen scenarios. Makingcode modifications can also mean to optimize

    the applications use of memory or to use

    compiler directives that help the compiler create

    code that works better for specific processors. This series of newsletters will provide

    several directives that help in this way.

    Using Compiler Options

    For example, compilers will not always use Single Instruction, Multiple Data (SIMD) in

    certain algorithms. This limitation may be due to the complexity of the loops or the

    inability of the compiler to guarantee the independence between iterations of the loopthat is required to ensure correct behavior. A suitable compiler directive or restructuring

    of the loop may be required to let the compiler know that SIMD will work fine.

    In addition to compiler directives or switches, command-line options passed to the

    compiler and linker enable you to identify conditions where the compiler can make

    further optimizationsconditions that the compiler might not be able to make on its

    own. For example, the /fp:fast option tells the compiler to use faster but less-precise

    floating point instructions.

    Compiler Switches

    Table 1 identifies some of the compiler switches related to optimization that areavailable in the C++ compilers from Microsoft and Intel.

    Table 1 Compiler Switches

    Microsoft C++ Intel C/C++ Description

    /O1 -O1 Creates the smallest possible code.

    /O2 -O2 Creates faster code, possibly increasing size.

    /O3 -O3 Creates even faster code (sometimes).

    /Oa -fno-alias Assumes no aliasing in the application, and

    enables some register and loop optimizations.

    /Ow -fno-fnalias Assumes no aliasing between functions, but

    aliasing within functions.

    /Ob -Ob Controls inline expansion. Inlining functions

    reduce function call overhead.

    /Og - Combines several types of optimizations.

    /Oi /fbuiltin Enables inlining intrinsic functions to replace

    some common functions

    /Os -Os Optimizes for speed, but favors small code.

    /Ot - Favors speed over size.

    /Ox - Provides maximum optimization.

    /arch -mtune,-mcpu,-

    Qx

    Uses Simple Sharing Extensions (SSE) or SSE2

    instructions.

  • 8/6/2019 64-Bit Insider Volume 1 Issue 13

    4/7

    64-bit Insider Newsletter

    Volume 1, Issue 13

    Page 4/7

    /G5 -mtune,-mcpu,-

    Qx

    Favors the Pentium processor.

    /G6 -mtune,-mcpu,-

    Qx

    Favors the Pentium Pro, II, III, and Pentium 4

    processors.

    /G7 -mtune,-mcpu,-

    Qx

    Favors Pentium 4 and AMD Athlon processors.

    /fp:fast -fp-modal Enables more aggressive optimizations on

    floating-point data, possibly sacrificing precision.

    /GL -Qipo Yields whole program optimization / Inter-

    procedural optimization.

    /PGI,/PGO /prof-gen Gives profile-guided optimization.

    /favor - Optimizes for either the AMD or Intel 64-bit

    processors or for both.

    Note Please review the documentation for both compilers to learn the specific details

    and associated caveats for each switch. The Intel compiler has additional options for

    high-level and floating-point optimization. Please refer to the Intel documentation for

    more information about these options.

    Understanding Link-Time Code Generation and Profile GuidedOptimization

    Confusion frequently surrounds the terminology and differences between Whole

    Program Optimization (or Link-Time Code Generation [LTCG]) and Profile-Guided

    Optimization (PGO). This section clarifies the differences.

    In the Microsoft Visual Studio documentation, Whole Program Optimization is also

    called LTCG. Confusion usually stems from the fact that PGO is also a type of WholeProgram Optimization and uses LTCG to get its work done. As a first step toward

    clarifying both of these, this newsletter will only use the term LTCGnot the less

    accurate term, Whole Program Optimization.

    Both LTCG and PGO are techniques that allow you to optimize your application

    without making changes to your code.

    Link-Time Code Generation

    LTCG is simply a mechanism by which the generation of machine code (and

    accompanying optimization) is being delayed until link time. So strictly speaking, it is

    not a form of optimizationLTCG just enables additional optimizations. The classic

    compile/link process compiles and optimizes files individually and then links them

    together. LTCG enables additional optimizations

    because it postpones the optimization steps until link

    time when all the object files are available, and they can

    all be optimized together.

    As shown in Figure 1, there are two differences between

    the procedures for compiling and linking when

    performed with and without LTCG. First, when using

    LTCG, the optimization step is moved to link time.Second, the object files generated by the compiler are

    LTCG is simply amechanism by which the

    generation of machine

    code (and accompanying

    optimization) is being

    delayed until link time.

  • 8/6/2019 64-Bit Insider Volume 1 Issue 13

    5/7

    64-bit Insider Newsletter

    Volume 1, Issue 13

    Page 5/7

    not in the standard Common Object File Format (COFF). They are in a proprietary

    format that can change between versions of the compiler. As a result, programs like

    DUMPBIN and EDITBIN do not work with these object files.

    The proprietary format allows the optimizer to consider all object files during

    optimization. This consideration enables more effective inlining, memorydisambiguation, and other inter-procedural optimizations. Also, the executable can be

    better arranged to reduce offsets for things like Thread Local Storage and to reduce

    paging in large executables. Refer to the compiler documentation for a full description

    of the optimizations that are enabled by LTCG.

    Figure 1 Comparison of compiling procedures with and without LTCG

    Profile-Guided OptimizationMany optimization techniques are heuristic in nature and many involve trade-offsbetween image size and speed. For example, choosing whether to inline a function

    depends on how large the function is and how often it is called. Small functions that are

    called many times should be inlined. Large functions that are called only once, from a

    few locations, should not be inlined. At least, this practice is usually a safe bet. But you

    must also consider situations that fall between these two extremes.

    Generic algorithms can be used to determine whether or not to inline functions. And,

    frequently, it is not clear how often a function is called. For example, the function may

    be guarded by a condition. Similar problems exist for branch prediction (which

    determines the order of switch and other conditional statements).

    EXE

    C++

    OBJ

    (COFF)

    Linker

    C++

    OBJ

    (COFF)

    EXE

    C++

    OBJ

    (CIL)

    Compiler(/GL)

    C++

    OBJ

    (CIL)

    Compiler(/GL)

    Optimizer Optimizer

    Optimizer

    Compiler Compiler

    Linker

    (/LTCG)

    Without LTCG With LTCG

  • 8/6/2019 64-Bit Insider Volume 1 Issue 13

    6/7

    64-bit Insider Newsletter

    Volume 1, Issue 13

    Page 6/7

    PGO is a technique whereby your program can be

    executed with a representative set of data. Your

    programs behavior will be monitored to determine how

    often pieces of code have been executed and how often

    certain branches have been taken. This techniqueproduces aprofile that the optimizer can use to make

    better decisions during optimization.

    So, PGO provides additional information about your

    applications behavior that is gathered during a runtime analysis of the application. PGO

    provides information that is not available during the static analysis of your application,

    alone. LCTG must still be used; however, PGO gives the optimization process even

    more information so that better decisions can be made.

    PGO requires a three-step approach, which is also highlighted in Figure 2:

    1. Compile and link your application to produce an instrumented program versionthat gathers information on your applications behavior at runtime. This step

    requires a /GL switch for the compiler and a /LTCG:PGI switch for the linker.

    2. Execute your application and feed it data or user input that represents data thatwould be expected from a target user. It is important to choose this data

    carefully; otherwise, you will be optimizing your program for irrelevant

    scenarios.

    3. Re-link your application with the /LTCG:PGO switch to re-optimize yourapplication by using the new information generated in Step 2.

    Figure 2 Profile-Guided Optimization three-step approach

    Again, please consult the compiler documentation for a full description of the kinds of

    optimizations that can be performed during PGO.

    PGO provides additionalinformation about your

    applications behavior thatis gathered during a

    runtime analysis of theapplication.

    C++OBJ

    (CIL)Compiler

    (/GL)OptimizerLinker

    (/LTCG:PGI)

    OptimizerInstrumented

    EXE

    Sample

    data

    Profile

    Linker

    (/LTCG:PGO)

    Optimized

    EXE

  • 8/6/2019 64-Bit Insider Volume 1 Issue 13

    7/7

    64-bit Insider Newsletter

    Volume 1, Issue 13

    Page 7/7

    Summary

    Optimization means to maximize your applications performance by reducing its use of

    expensive or slow resources, and doing so without reducing the applications ability to

    do work. Although spending money on extra hardware can help, appropriate changes to

    how your application is written or built can yield substantial performance gains, as well.

    Assuming that your algorithms are sound, compiler flags should be the first place you

    look to increase your applications performance. Both LTCG and PGO can be enabled

    by using compiler flags and can substantially improve performance without changing asingle line of code.