cpe 619 the art of data presentation aleksandar milenković the lacasa laboratory electrical and...

57
CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama in Huntsville http://www.ece.uah.edu/~milenka http://www.ece.uah.edu/~lacasa

Upload: charlotte-carpenter

Post on 11-Jan-2016

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

CPE 619The Art of Data Presentation

Aleksandar Milenković

The LaCASA Laboratory

Electrical and Computer Engineering Department

The University of Alabama in Huntsville

http://www.ece.uah.edu/~milenka

http://www.ece.uah.edu/~lacasa

Page 2: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

2

Overview

Types of Variables Guidelines for Preparing Good Charts Common Mistakes in Preparing Charts Pictorial Games Special Charts for Computer Performance

Gantt Charts Kiviat Graphs Schumacher Charts

Decision Maker’s Games

Page 3: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

3

Types of Variables

Type of computer: Super computer, minicomputer, microcomputer

Type of Workload: Scientific, engineering, educational Number of processors Response time of system

Page 4: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

4

Guidelines for Preparing Good Charts

1) Require minimum effort from the reader Direct labeling vs. legend box

2) Maximize Information Words in place of symbols; cleary label the axes

Page 5: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

5

Guidelines (cont’d)

3) Minimize ink No grid lines, more details

4) Use commonly accepted practices origin at (0,0); independent variable (cause) along x axis; the dependent

variable (effect) along the y axis; linear scales; increasing scales; equal divisions

5) Avoid ambiguity Show coordinate axes, scale divisions, origin;

Identify individual curves and bars

Page 6: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

6

Checklist for Good Graphics Are both coordinate axes shown and labeled? Are the axes labels self-explanatory and concise? Are the scales and divisions shown on both axes? Are the minimum and maximum of the ranges shown on the axes appropriate to present maximum information Is the number of curves reasonably small? Do all graphs use the same scale? Is there no curve that can be removed without reducing information? Are the curves on a line chart individually labeled? Are the cells in a bar chart individually labeled? Are all symbols on the graph accompanied by appropriate textural explanations? If the curves cross, are the line patterns different to avoid confusion? Are the units of measurement indicated? Is the horizontal scale increasing from left to right? Is the vertical scale increasing from bottom to top? Are the grid lines aiding in reading the curves? Does this whole chart add to information available to the reader? Are the scales contiguous? Is the order of bars in a bar chart systematic? If the vertical axis represents a random quantity, are confidence intervals shown? Are there no curves, symbols, or texts on the graph that can be removed without affecting the information? Is there a title for the whole chart? Is the chart title self-explanatory and concise? For bar charts with unequal class interval, is the are and width representative of the frequency and interval? Do the variable plotted on this cart give more information that other alternatives? Does the chart clearly bring out the intended message? Is the figure referenced and discussed in the text of the report?

Page 7: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

7

Common Mistakes in Preparing Charts

Presenting too many alternatives on a single chart Max 5 to 7 messages => Max 6 curves in a line charts,

no more than 10 bars in a bar chart, max 8 components in a pie chart

Presenting many y variables on a single chart

Page 8: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

8

Common Mistakes in Charts (cont’d)

Using symbols in place of text

Placing extraneous information on the chart E.g., grid lines, granularity of the grid lines

Selecting scale ranges improperly Automatic selection by programs may not be appropriate

Page 9: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

9

Common Mistakes in Charts (cont’d)

Using a line chart in place of column chart line => continuity

CPU Type

8000 8100 83008200

MIPS

Page 10: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

10

Pictorial Games

Using non-zero origins to emphasize the difference Three quarter high-rule => height/width > 3/4

Mine is much better than yours (emphasize difference)

Mine and yours are almost the same (conceal difference)

Height of the highest point should be at least ¾ of the horizontal offset of the rightmost point

Page 11: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

11

Pictorial Games (cont’d)

Using double-whammy graph for dramatization Using related metrics

Page 12: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

12

Pictorial Games (cont’d)

Plotting random quantities without showing confidence intervals

Means of two random variables Means are not enough. Overlapping confidence intervals usually means that the two random

quantities are statistically indifferent.

Page 13: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

13

Pictorial Games (cont’d)

Pictograms scaled by height Wrong scaling: Area(MINE) > 4*Area(YOURS)??

MinePerformance = 2

YoursPerformance = 1

Page 14: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

14

Pictorial Games (cont’d)

Using inappropriate cell size in histograms

[0,2) [2,4) [4,6) [6,8) [8,10) [10,12) [0,6) [6,12)Response Time Response Time

Frequency Frequency

2

4

6

8

10

12

0

2

4

6

8

10

12

0

Normal distribution Exponential distribution

Page 15: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

15

Pictorial Games (cont’d)

Using broken scales in column charts Amplify differences

ASystem

Resp.Time

2

4

6

8

10

12

0B C D E F A

System

Resp.Time

0B C D E F

9

10

11

12

Page 16: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

16

Special Charts for Computer Performance

Gantt charts Kiviat Graphs Schumacher's charts

Page 17: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

17

Gantt Charts

Shows relative duration of a number of conditions

CPU

IO Channel

Network

20% 40% 60% 80% 100%0%

60

20 20

30 10 5 15

Utilization

Page 18: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

18

Example: Data for Gantt Chart

Page 19: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

19

Draft of the Gantt Chart

Page 20: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

20

Final Gantt Chart

Page 21: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

21

Kiviat Graphs

Radial chart with even number of metrics HB and LB metrics alternate Ideal shape: star

CPUBusy CPU in

Supervisor State

CPU inProblem State

CPUWaitAny Channel

Busy

Channel onlyBusy

CPU/ChannelOverlap

CPU OnlyBusy

Page 22: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

22

Kiviat Graph for a Balanced System

Problem: Inter-related metrics CPU busy = problem state + Supervisor state CPU wait = 100 – CPU busy Channel only – any channel –CPU/channel overlap CPU only = CPU busy – CPU/channel overlap

CPUBusy CPU in

Supervisor State

CPU inProblem State

CPUWaitAny Channel

Busy

Channel onlyBusy

CPU/ChannelOverlap

CPU OnlyBusy

Page 23: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

23

Shapes of Kiviat Graphs

CPU Keel boat I/O Wedge I/O ArrowCPU bound system I/O bound system CPU- and I/O bound

system

Page 24: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

24

Merrill’s Figure of Merit (FoM)

Performance = {x1, x2, x3, …, x2n}Odd values are HB and even values are LB

x2n+1 is the same as x1

Average FOM = 50%

Page 25: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

25

Example: FoM

System A:

Page 26: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

26

FoM Example (Cont)

System B:

System B has a higher figure of merit and it is better.

Page 27: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

27

Figure of Merit: Known Problems

All axes are considered equal Extreme values are assumed to be better Utility is not a linear function of FoM Two systems with the same FoM are not equally

good System with slightly lower FoM may be better

Page 28: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

28

Kiviat Graphs For Other Systems

Use Kiviat graphs for networks

ApplicationThroughput

Packets With Error

Implicit Acknowledgements

Duplicate Packets

LinkUtilization

LinkOverhead

Page 29: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

29

Schumacher Charts Performance matrix are plotted in a tabular manner Values are normalized with respect to long term means and standard deviations Any observations that are beyond mean one standard deviation need to be explained See Figure 10.25 in the book

Page 30: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

30

Performance Analysis Rat Holes

ConfigurationWorkload Metrics Details

Page 31: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

31

Reasons for not Accepting an Analysis

This needs more analysis. You need a better understanding of the workload. It improves performance only for long IOs/packets/jobs/files,

and most of the IOs/packets/jobs/files are short. It improves performance only for short IOs/packets/jobs/files,

but who cares for the performance of short IOs/packets/jobs/files, its the long ones that impact the system.

It needs too much memory/CPU/bandwidth and memory/CPU/bandwidth isn't free.

It only saves us memory/CPU/bandwidth and memory/CPU/bandwidth is cheap.

See Box 10.2 on page 162 of the book for a complete list

Page 32: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

32

Examples

Page 33: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

33

Summary

Qualitative/quantitative, ordered/unordered, discrete/continuous variables

Good charts should require minimum effort from the reader and provide maximum information with minimum ink

Use no more than 5-6 curves, select ranges properly, Three-quarter high rule

Gantt Charts show utilizations of various components Kiviat Graphs show HB and LB metrics alternatively on a

circular graph Schumacher Charts show mean and standard deviations Workload, metrics, configuration, and details can always be

challenged. Should be carefully selected.

Page 34: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

34

Exercise 10.1

What type of chart (line or bar) would you use to plot:a. CPU usage for 12 months of the year

b. CPU usage as a function of time in months

c. Number of I/O's to three disk drives: A, B, and C

d. Number of I/O's as a function of number of disk drives in a system

Page 35: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

35

Exercise 10.2

List the problems with the following charts

Page 36: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

36

Exercise 10.3

On a system consisting of 3 resources, called A, B, and C. The measured utilizations are shown in the following table. A zero in a column indicates that the resource is not utilized. Draw a Gantt chart showing utilization profiles.

Page 37: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

37

Exercise 10.4

The measured values of the eight performance metrics listed in Example 10.2 for a system are: 70%, 10%, 60%, 20%, 80%, 30%, 50%, and 20%. Draw the Kiviat graph and compute its figure of merit.

Page 38: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

38

Exercise 10.5

For a computer system of your choice, list a number of HB and LB metrics and draw a typical Kiviat graph using data values of your choice.

Page 39: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

Ratio Games

Page 40: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

40

Overview

Ratio Game Examples Using an Appropriate Ratio Metric Using Relative Performance Enhancement Ratio Games with Percentages Ratio Games Guidelines Numerical Conditions for Ratio Games

Page 41: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

41

Case Study 11.1: 6502 vs. 8080

Conclusion: 6502 is worse. It takes 4.7% more time than 8080.

1. Ratio of Totals

Page 42: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

42

6502 vs. 8080 (Cont)

1. Ratio of Totals: 6502 is worse. It takes 4.7% more time than 8080.

2. With 6502 as a base: 6502 is better. It takes 1% less time than 8080.

3. With 8080 as a base: 6502 is worse. It takes 6% more time.

2. 6502 as the base: 3. 8080 as the base:

Page 43: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

43

Case Study 11.2: RISC vs. CISC

Conclusion: RISC-I has the largest code size. The second processor Z8002 requires 9% less code than RISC-I.

Page 44: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

44

RISC vs. CISC (Cont)

Conclusion: Z8002 has the largest code size and that it takes 18% more code than RISC-I. [Peterson and Sequin 1982]

13.0011.00 8.50 10.50 8.00

Page 45: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

45

Using an Appropriate Ratio Metric

1. Throughput: A is better2. Response Time: A is worse3. Power: A is better

Example:

Page 46: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

46

Using Relative Performance Enhancement

Example: Two floating point accelerators

Problem: Incomparable bases. Need to try both on the same machine

Page 47: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

47

Ratio Games with Percentages

Example: Tests on two systems

1. System B is better on both systems2. System A is better overall.

System A:

System B:

Page 48: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

48

Percentages (Cont)

Other Misuses of Percentages: 1000% sounds more impressive than 11-time. Particularly if the

performance before and after the improvement are both small Small sample sizes disguised in percentages Base = Initial. 400% reduction in prices Base = Final

Page 49: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

49

Ratio Games Guidelines

1. If one system is better on all benchmarks, contradicting conclusions can not be drawn by any ratio game technique

Page 50: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

50

Guidelines (cont)

2. Even if one system is better than the other on all benchmarks, a better relative performance can be shown by selecting appropriate base. In the previous example, System A is 40% better than System B

using raw data, 43% better using system A as a base, and 42% better using System B as a base.

3. If a system is better on some benchmarks and worse on others, contracting conclusions can be drawn in some cases. Not in all cases.

4. If the performance metric is an LB metric, it is better to use your system as the base

5. If the performance metric is an HB metric, it is better to use your opponent as the base

6. Those benchmarks that perform better on your system should be elongated and those that perform worse should be shortened

Page 51: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

51

Numerical Conditions for Ratio Games

A is better than B iff

A is better than B iff

Raw Data

With A as the Base

Page 52: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

52

Numerical Conditions (Cont)

A is better than B iff With B as the base

Page 53: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

53

Numerical Conditions (Cont)

Ratio of B/A response on benchmark i

Rat

io o

f B

/A r

espo

nse

on b

ench

mar

k j

1

2

01 12 31

Raw Data

Base B

Base A

A isbetterusing all 3

B is betterusing all 3

Page 54: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

54

Summary

Ratio games arise from use of incomparable bases Ratios may be part of the metric Relative performance enhancements Percentages are ratios For HB metrics, it is better to use opponent as the

base

Page 55: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

55

Exercise 11.1

The following table shows execution times of three benchmarks I, J, and K on three systems A, B, and C. Use ratio game techniques to show the superiority of various systems.

Page 56: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

56

Exercise 11.2

Derive conditions necessary for you to be able to use the technique of combined percentages to your advantage.

Page 57: CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama

57

Homework

Read chapter 10&11