performance engineering prof. jerry breecher workloads and tools

Performance Engineering

Prof. Jerry Breecher

Workloads And Tools

Workloads And Tools 2

WORKLOADS AND TOOLS The issue around workloads is this: You want to be able to represent the reality of some environment. To do this, you must build a model or representation of that environment. This is a workload.

Tools are simply the component pieces of a workload; they may represent the driving part of that workload, or they may form the measurement component.

DEFINITIONS: Reality ===> Model or Abstraction ===> Workload Reality is what is - it's essentially un-measurable because it's too complex, too remote, too secret, .... A Model is the thinking that goes into abstracting reality. A Workload is an attempt to approximate the model. A Benchmark is a stylized workload, usually very portable, used to compare various systems.


Benchmarks

Many people build benchmarks that mimic their application. They abstract what they feel are the essential components of their application for the benchmark. This then allows them portability. They can

• Try their benchmark on new hardware.

• Run it on a test machine to beat on new applications.

• Play “what if” games – what if we add disks to the machine, for instance.

APPLICATION BENCHMARKS:


Benchmarks

Computation Benchmarks – they pretty much depend on the speed of the hardware and the efficiency of the compiler. Useful for hardware comparisons.

Sieve of Eratosthenes Determines prime numbers. Has a series of loops.

Whetstone A synthetic benchmark designed to measure the behavior of scientific programs.

Dhrystone Claims to represent system programming environments. Generally integer rather than floating point arithmetic.

SPEC Benchmarks developed by the Systems Performance Evaluation Cooperative. Widely used on UNIX systems where the code is extremely portable. Contains the kinds of activities commonly found in engineering and scientific environments (compiles, matrix inversions, etc.)

POPULAR BENCHMARKS


Benchmarks

Application/System Benchmarks – they pretty much depend on the efficiency of the OS and application. Useful for software comparisons.

Webstone Measures the number of accesses that can be made remotely on a target

module. Measures network message handling, web server behavior, file lookup.

WebBench Measures how many accesses to a webserver can be accomplished in a given time.

TPC A series of Transaction Processing Council

benchmarks. They generally are database oriented. A typical “transaction” involves doing a query on several data items and then updating those items.

AIM A series of operating system actions (scheduling,

page faults, disk writes, IPC, etc.) Each of the actions is relatively atomic and can be run in either standalone/separate mode or as a bundle of tests.

POPULAR BENCHMARKS


Benchmarks

Major issues include: • How to characterize/define/enumerate the load on your product – what

parameters to use in a quantification. This means abstracting to get a model.

• How to build a load that matches this abstraction, and thus ideally the real load.

• How to define the boundary of the system - what is load and what is system? Compilers, editors, etc are a gray area.

WORKLOAD CHARACTERIZATION


Benchmarks Characteristics of a model include: • Representativeness and accuracy; does the model match reality?

• Flexibility; is the model extensible so it can match a changing real model?

• Simplicity; reducing construction cost and complexity of gathering information.

• Compactness; is the model easy to use and inexpensive to run?

• System independence; is the model portable?

• Reproducibility; what is the degree of control the user has over the model? So: What is the relative importance of these characteristics in a typical Development

Environment? Example: List some benchmarks/tools that could be used by development groups

How do they fit these characteristics?

WORKLOAD CHARACTERIZATION


Benchmarks This involves figuring out what behaviour to approximate and then what workload to

produce in order to duplicate this behaviour. Of the many possible behaviors on a system, which one do we want to single out.

What are job parameters - or what behaviour do we focus on? • program(s) CPU used by program. Number/type of system calls

• Disk Number and distribution of disk accesses.

• CPU Number and distribution of machine instructions.

Each of these raw numbers involves means, distributions, etc. interpreted in several ways. For example, disk accesses can be represented as:

• Seek distributions (seek length profiles)

• Disk busy times

• Response times

• Throughput

APPROACHES TO CHARACTERIZATION


Benchmarks Then to develop a benchmark that matches the “real” behaviour, there are

two (in the extreme) approaches:

Establish N jobs each having the mean behaviour.

Use N jobs with a random behaviour (having a distribution equivalent to the real one) with a mean matching the defined parameters.



Benchmarks Example:In a “real” environment, there are 100 people entering data at any one time. The

average person completes 20 fields a minute, but there is a typical variation of +/- 5 – some people type 15 fields/minute and some get as high as 25 fields/minute.

How would you represent the input from these 100 people?

Example:A Whetstone program is designed to use the machine instructions in typical

FORTRAN, computationally intensive programs. • Do the instructions in a whetstone reflect a realistic computing environment?

• Can we use these MIPS and Whetstones to compare two machines?

• Are TPC's a better way to measure MIPS?



Benchmarks There are numerous ways to express a component of system behavior.

Example:

Suppose a large number of processes are using the CPU. We can say either of the following:

There are 1000 process schedules in a second. The CPU is 55% busy, therefore each of the processes requires 0.55 milliseconds of CPU each time it asks for processing. This averaging expressed in a more formal way is simply

There are 1000 process schedules in a second. The CPU is 55% busy. But there’s a wide variation in the processor demand based on the kind of processes or just simply randomness (a particular process needs different amounts of CPU based on where it is in its transaction.) Then we’d like to be able to express the CPU required as a mean (as in a)) and also a standard deviation given by

EXPRESSING THE CHARACTERIZATION

n

iiave XnX

1

)/1(

n

iavei XX

ns

1

22 )(1

1


Benchmarks The model must accurately reflect the behaviour of the modeled system.

From a production work load we extract a real work load - some portion of the total/production load. Our goal is to represent the real load in our model.

A model is more compact than a real load. This saves time and money. Is the missing information essential for representativeness? Can representativeness be quantified?

Workload_1 ---> SYSTEM ---> Performance_Index_1 ( PI1 )

Workload_2 ---> SYSTEM ---> Performance_Index_2 ( PI2 )

If the two work loads produce Performance Indices which are the same within some precision, then Workload_1 is representative of Workload_2. Note this representativeness may not hold if the System changes or if PI1 changes (meaning we use a different set of parameters to measure against).

REPRESENTATIVENESS


Benchmarks

How easy is it to find a work load that is representative? Not very easy! Issues include:

• How performance indices depend on work load and system parameters.

• Often the dependence is very non-linear, and effects are non-additive.

• Interactions can exist between the parameters, though these are usually so complex that they must be ignored.

Example:

Increasing the level of multiprogramming increases memory usage which increases paging and CPU-per-process usage.

REPRESENTATIVENESS


Benchmarks

The parameters by which we model a workload should NOT depend on the type of system, on its configuration or on its software.

Example:

For instance, suppose we partially characterize a workload based on the number of paging requests made. Then increasing memory will cause fewer page faults that may or may not affect user visible performance.

Vendors suggest benchmarks that will be advantageous to their company - they LOOK for system dependence.

There are ways of being system independent; in fact, that's what open systems are all about.

Characterize logically rather than physically. If you define a test in terms of “lines of C”, it’s much more portable than “lines of assembler”.

SYSTEM INDEPENDENCE


Benchmarks

Example:Suppose we want to measure the performance changes due to different placements

of files on the disks. (Full disk vs. almost-empty disk characteristics.) Or another way of saying this – how does disk performance depend on the fullness of the disk: Here are different ways to design the same test:

• Measuring file accessing time in terms of their location on the the is inadequate. Results should be independent of the manufacturer, size of disk, etc. You’d like to abstract your results as much as possible.

• You must make sure you understand how the files are placed on the disk; a change of placement may affect the results. The way around this is to measure enough files so you remove the randomness inherent in placement.

SYSTEM INDEPENDENCE


Benchmarks

Natural work loads: Samples of the production work load that the system processes at the time of the experiment. It's generally shorter than a real load. Modeling in this case means choosing the times of data collection. We need both: An accurate characterization - we know what parameters to use in describing our load, and what values they should be. An accurate implementation - we can find a workload which matches our characterization.

THE CONSTRUCTION OF WORK LOAD MODELS

Pros and cons of natural work loads: They may be very representative, especially if the natural load is relatively stable. System independence is low. Not very controllable (only times and durations can be determined.) This means poor flexibility and reproducibility. Cost to produce is relatively low. Usage cost is high because they aren't compact - having a long run time and a great deal of data.


Benchmarks Artificial Work Loads: Programs that aren’t derived from the production load. We can describe these workloads in terms of the level of parameterization; we can

build models to match a real load at any of these levels: • At the machine instruction level (number of adds, moves, etc.)

• At C Code statement level (number of do statements, etc.)

• At low level OS parameters (number of reschedules/sec.)

• At system call level (number of get_time_of_day / sec.)

• At application level (number of text lines searched.)

• At interactive command level (edit, compile, etc)



Benchmarks

Alternative Methods of building artificial workloads: 1. Construct the probability distributions of the parameters in the real workload.

By sampling these distributions, derive the parameters of each job in the artificial workload.

2. Extract real jobs from the real workload by sampling it. Use the parameters of each job to characterize a job in the model.

– Partition the jobs of the real work load into classes, each characterized by similar combinations of parameters. Choose a suitable number of jobs in each class, and use the parameters of each job to characterize a job in the model. In the two extremes we have:

– Least Exact -> find the average values of the parameters and build jobs around those averages.

– Most Exact -> understand the characteristics of each of the real jobs and emulate the real system, job for job.



Benchmarks THE CONSTRUCTION OF WORK LOAD MODELS

Pros and cons of artificial work loads: • Useful for extrapolation from a present load to some future nonexistent

load.

• Can be made compact (one program can run with a variety of parameters).

• May be expensive to produce, but easy to run.

• Fairly system independent, especially if developed at a conceptual level. (Obviously instruction mix problems are an exception.)

• The more detailed the model, the more representative it will be. Example:Characterize workloads with which you are familiar in terms of level of

parameterization, and most exact/least exact.


Benchmarks THE CONSTRUCTION OF WORK LOAD MODELS

Example:

Pat is designing a communications server that receives requests from "higher level" routines. The requests are collected by a Request Handler that does nothing but put them into buffers. The Request Processor removes these requests from the buffers on a first come first serve basis.

requests -> Request Handler -> Buffers[n] -> Request Processor ->

This product will be used in a wide range of applications; the "higher level" routines typically send packets of 1348 bytes, but other sizes are also possible. In addition, the applications will be placing variable load on the system; loads might range from "very light" to "extremely heavy".

Pat wishes to describe a benchmark (or tool) that can be used to test this product. (The specification of this benchmark is necessary since the Functional Spec requires a description of how the product will perform.)


Benchmarks THE CONSTRUCTION OF WORK

LOAD MODELS

Example Continued:Given the limited information available, what properties would you recommend

this benchmark have (list in order of preference?) Representative-ness Flexibility Simplicity Compactness System Independence Reproducibility What metrics are important here; on what basis will this product be judged? Rate

them in order. (Remember that alternatives can range from low level to high level.) Should the benchmark be artificial or natural? Describe how you would write the test.


MEASURING TOOLS

What Are They?

We're interested in tools that count and sample the activity of a computer. These tools may also generate the requests that they count.

A tool can be thought of as made up of the following components:

Sensor The sensor performs the actual measurement.

Transformer The transformer averages / packages / filters / reduces the measurement.

Indicator The indicator performs a display.


MEASURING TOOLS

What Are They?

Example:

Here are the outputs of three different tools. Comment on the value of each of them.

MODULE INFO: %es#m19 VOS Release 14.0.2o G748

MACRO VERS.: April 18, 2000 DATE / TIME: 2000-06-02 15:44:32

MASTER DISK: %es#m19

#hsc_enet.19.2 15:44:45 4 0

#hsc_enet.19.3 15:44:45 15 79

#hsc_enet.19.4 15:44:45 281 114

#hsc_enet.19.5 15:44:46 16 78


MEASURING TOOLS

What Are They?

Example:


Total Clients: Transactions = 16372 Bytes = 16372

Module Utilization = 10.15% Module Transactions Per Second = 261.00

Module CPU Millisecs Per Transaction = 0.778

Total Clients: Transactions = 17682 Bytes = 17682

Module Utilization = 13.15% Module Transactions Per Second = 289.87

Module CPU Millisecs Per Transaction = 0.907


MEASURING TOOLS

What Are They?

Example:


INTERVAL REPORT: 00-06-02 18:31:48 edt

Disk accesses that exceeded 10 seconds: 0

Disk accesses that exceeded trigger: 0

File: %es#m9>process_dir_dir>mdl_file

Counts between m - n milliseconds:

0- 5, 5- 10, 10- 15, 15- 20, 20- 25, 25- 30, 30- 35, 35- 40, 40- 45, 45- 50, 0, 0, 42, 166, 39, 0, 0, 1, 0, 1,

What could be done to make this tool better?


MEASURING TOOLS

Transformation technique:

By this is meant turning raw data into useful information.

• Reduction (counting tool) does some data analysis before storing the data.

• Post reduction/no reduction (tracing tool) simply stores measured events in

chronological order.

The choice of techniques can greatly affect interference. Reduction generally saves disk accesses but costs CPU.


MEASURING TOOLS

HARDWARE, SOFTWARE, AND HYBRID TOOLS

This description indicates the nature of the tools rather than the target.

Hardware:

Oscilloscopes, logic analyzers, etc.

Software:

Programs.

Hybrid:

A combination - code causes an electrical signal that is captured by hardware.

Tools are sample or event driven.

A sampling device makes a measurement repetitively in a cyclic fashion (every N seconds).

An event driven device records when some external result occurs.


MEASURING TOOLS


HARDWARE TOOLS: External monitoring tools which interface to a target system with probes. Electrical

signals are received by the collector with no interference to the system. Transformation, analysis and display occur on the external tool.

Characteristics: • Little or no interference to the target.

• Excellent time resolution.

• Semi portable - can be used anywhere as long as signals are available.

• Can see software events by observing that an address was executed.

• Memory states can be observed only during a read or write at a particular code location.

• Attaching probes to machines makes some people nervous.

• Terrible to haul to a customer site.


MEASURING TOOLS


SOFTWARE TOOLS: • Code is added on the target process to gather performance data. These

software probes collect data, reduce it, and store it in internal buffers.

• Used on system routines for capturing information.

• Software monitoring tools generally have poorer resolution than hardware tools. Better adapted for slower, less frequent events.

Generally there are two flavors of software collectors: • Sampling programs: these sample at constant time intervals.

Examples:PC Histograms - where is a program executing?Disk utilization - sample if a disk is busy or idle.


MEASURING TOOLS


SOFTWARE TOOLS: Generally there are two flavors of software collectors: 2. Counting programs: these perform some action whenever an event occurs.

Examples:Record the number of disk requests.Record the time a user is logged on.

Software Tool Characteristics: Interference with system may be high ( generally 5% is ok if the tool can be turned off.)There may be a lower time resolution since the tool must depend on the system clock.Hardware events are detected only when they cause some software to execute. Question: Do you use any software tools?


MEASURING TOOLS


DISTRIBUTED TOOLS:

There’s a strong need to be able to monitor the activities on numerous systems at the same time. An operator may need to observe numerous systems and detect anomalies in any of them. Required properties include:

a) Tools often contain alarms that trigger when predefined limits are exceeded.

b) Tools must run on numerous machines but display on a centralized machine. c) Analysis of results should take place on each of the machines so as not to load

down the display machine. • There should be a way to log all data for playback at a later time.

Tools can also be used to measure the distributed system itself. Networks must have the ability to monitor traffic, congestion, delay times, etc.


MEASURING TOOLS


DRIVERS

Drivers produce a workload. Measurement of this load is obtained either with

system metrics or within the program itself.

Example:

A program determines the time required to write to a disk 1000 times. It prints the elapsed time when it's done.


CAPACITY PLANNING:

What is It?

Every shop that follows the above process is building a home-grown benchmark. That benchmark can measure only limited facets of the system. There are numerous difficulties in following this process:

1. There is no standard definition of capacity – the typical installation will talk in terms of Transactions/Second – for whatever transaction they happen to run.

2. A benchmark won’t necessarily catch all the capacity needs; it may handle disk and CPU just fine, but discover that the database runs out of capacity.

3. The Transaction used in the database planning doesn’t remain constant.

4. A benchmark is typically used over and over again. By the time the system is 10X bigger than the original measurement, the test is outdated.


WRAPUP

This chapter has looked at a number of techniques for developing tools that will tell you about

performance.

performance engineering prof. jerry breecher workloads and tools

Documents