data visualisation

76
Data Visualisation Harvinder Atwal

Upload: harveysa

Post on 12-Jan-2015

276 views

Category:

Technology


4 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Data visualisation

Data Visualisation

Harvinder Atwal

Page 2: Data visualisation

2

Agenda

Warm-Up

Data Visualisation: Why it matters

The Rules

Seven Common Quantitative Relationships

The best means to encode quantitative data in charts

Step by Step Guide

Test

5 mins

5 mins

10 mins

5 mins

5 mins

10 mins

5 mins

Page 3: Data visualisation

3

Agenda

Warm-Up

Data Visualisation: Why it matters

The Rules

Seven Common Quantitative Relationships

The best means to encode quantitative data in charts

Step by Step Guide

Test

5 mins

5 mins

10 mins

5 mins

5 mins

10 mins

5 mins

Page 4: Data visualisation

4

Even the best information is useless, if its story is poorly told

The effective display of quantitative information involves two fundamental challenges

Selecting the right medium of display (for example, a table or a graph, and the appropriate kind of either)

and

Designing the individual visual components of the selected medium to display the information and its message as clearly as possible

Most presentations of quantitative business data are poorly designed – painfully so, often to the point of misinformation.

Anyone can start drawing charts in Excel and use PowerPoint but hardly anyone is trained to do so effectively.

Page 5: Data visualisation

5

Bad Data Visualisation can have tragic consequences

In Jan 1986 NASA had to decide whether to launch the Challenger shuttle in

a “100-year cold”

Morton Thiokol engineers produced a chart and

recommended that shuttles not be flown

below 53F because of potential damage to the O-Rings in the booster

rockets

Morton Thiokol managers accepted the

recommendation and passed it on to NASA

NASA asks for the recommendation to be

reconsidered

Morton Thiokol managers agree to the flight

Page 6: Data visualisation

6

The engineers are Morton Thiokol came up with this chart

Looking at the O-Ring damage over the previous 24 shuttle missions, the data was presented in chronological order showing the location and extent of the damage sustained to the left and right boosters and the temperature at launch time.

Page 7: Data visualisation

The Morton Thiokol engineers failed to convince their management and NASA with fatal consequences

7

Page 8: Data visualisation

8

Would this chart have been more convincing?

If instead we remove all the extraneous data and do a simple plot of temperature vs damage then the pattern becomes much clearer.

Never damage above 76F

ALWAYS damage below 66F

Page 9: Data visualisation

WTF!? How many hours of valuable management time have been wasted trying to understand a badly drawn chart?

9

How many £billions have been wasted on incorrect decisions because someone has misinterpreted a chart message?

Page 10: Data visualisation

10

To communicate effectively visually you need to understand visual perception and cognition.

Present your message in a way that takes advantage of the strengths of visual perception while avoiding its weaknesses - matching the human thought process.

You can develop a simple set of skills (graphicacy) based on this knowledge.

This is mostly Not

, based on clear-cut principles about what works and what doesn’t

Page 11: Data visualisation

11

Agenda

Warm-Up

Data Visualisation: Why it matters

The Rules

Seven Common Quantitative Relationships

The best means to encode quantitative data in charts

Step by Step Guide

Test

5 mins

5 mins

10 mins

5 mins

3 mins

10 mins

5 mins

Page 12: Data visualisation

Research Finding: Communication is most effective when you say neither more nor less than what is relevant to your message.

Principle #1: Display neither more nor less than what is relevant to your message.

12

Page 13: Data visualisation

13

Tufte’s data-ink ratio is the single most important concept in data visualisation

Data-ink ratio =data-ink / total ink used to print the graphic

= proportion of a graphic’s ink devoted to thenon-redundant display of data-information

= 1.0 − proportion of a graphic that can be erasedwithout loss of data-information.

(The Visual Display of Quantitative Information, Edward R. Tufte, Graphics Press, Cheshire CT, 1983, p.93)

Page 14: Data visualisation

14

Eliminate all redundant visual information!

You wouldn’t write a document like this using multiple fonts, gratuitous formatting, redundant

excessive highlighting, variable colours, difficult to read italics, pointless underlining, desperate shadows in multiple

sizes.● Yet everyday you see the graphical equivalent as

people try to make their charts “interesting” instead of useful!.

Page 15: Data visualisation

15

How many items of redundant visual information can you see in this chart?

112

100

183

150

97

75

9185

0

20

40

60

80

100

120

140

160

180

200

Volume

Wales and West London And South East Scotland and North MidlandsRegion

Sales and Appointments by Region

Appointments

Sales

3-D Effect

Border on Legend

Grey Background

Highlighting for no reason

Floor

Legend Key

Data Labels

Underlining

Excessive tick marks

Vertical Lines

Border on Bars

Border

Page 16: Data visualisation

Less is more; the same chart de-junked…

0

20

40

60

80

100

120

140

160

180

200

Wales and West London And South East Scotland and North Midlands

Appointments Sales

Region

Volumes Sales and Appointments by Region

Page 17: Data visualisation

17

Research Finding: People perceive visual differences in an information display as differences in meaning.

Principle #2: Do not include visual differences in a graph that do not correspond to actual differences in the data.

Page 18: Data visualisation

What is the meaning of the different colours that appear on the bars? The answer is “nothing.”

18

Don’t confuse people and waste their time by including visual differences that are meaningless.

Page 19: Data visualisation

19

Research Finding: The visual properties that work best for representing quantitative values are the length or 2-D location of objects.

Principle #3: Use the lengths or 2-D locations of objects to encode quantitative values in graphs unless they have already been used for other variables.

Page 20: Data visualisation

20

#1 How much taller is bar B than A?

A B

Page 21: Data visualisation

21

#2 How much higher is point A than B?

A B

Page 22: Data visualisation

22

#3 How much bigger is the area of B than A?

AB

Page 23: Data visualisation

23

#4 How much darker is circle B than A?

AB

Page 24: Data visualisation

A B

Answers

A B

#1 #2

AB

#3 AB

#4 5x10x

4x

5x

Page 25: Data visualisation

25

How much taller is bar B than A?

A B

Page 26: Data visualisation

26

Bar B is actually only 10% bigger than A, not 100%

470

480

490

500

510

520

530

540

550

560

A B

Page 27: Data visualisation

27

Research Finding: People perceive differences in the lengths or 2-D locations of objects fairly accurately and interpret them as differences in the actual values that they represent.

Principle #4: Differences in the visual properties that represent values (that is, differences in their lengths or 2-D locations) should accurately correspond to the actual differences in the values they represent.

Page 28: Data visualisation

28

Research Finding: People perceive things that appear connected as wholes and things that appear disconnected as discrete.

Principle #5: Do not visually connect values that are discrete, thereby suggesting a relationship that does not exist in the data.

Page 29: Data visualisation

The regions are discrete, so values that measure something going on in these regions should be displayed as discrete.

29

Connecting discrete items with a line is misleading. Doing so forms a pattern of upwards and downwards slopes that are utterly meaningless.

Page 30: Data visualisation

30

Research Finding: People pay most attention to and consider most important those parts of a visual display that are most salient.

Principle #6: Make the information that is most important to your message more visually salient in a graph than information that is less important.

Page 31: Data visualisation

Some information is more important to your message than others

31

You can communicate this fact in a graph by making those items that are most important more visually dominant (salient).

It is your job to direct people’s eyes to the most important parts of the display, so they adequately focus on them.

Page 32: Data visualisation

32

Research Finding: Short-term memory is limited to about four chunks of information at a time.

Principle #7: Augment people’s short-term memory by combining multiple facts into a single visual pattern that can be stored as a chunk of memory and by presenting all the information they need to compare within eye span.

Page 33: Data visualisation

By presenting quantitative information visually as patterns, more information can be simultaneously stored in short-

term memory,

33

Each of the two lines in this line graph combines 12 different sales figures, one per month, into a single pattern of upward and downward sloping line segments.

When encoded in a visual pattern such as this, these 12 numbers can be stored together as a single chunk of information in short-term memory

Page 34: Data visualisation

34

Agenda

Warm-Up

Data Visualisation: Why it matters

The Rules

Seven Common Quantitative Relationships

The best means to encode quantitative data in charts

Step by Step Guide

Test

5 mins

5 mins

10 mins

5 mins

5 mins

10 mins

5 mins

Page 35: Data visualisation

35

Seven common quantitative relationships in graphs andhow to display them

Meaningful quantitative information always involves relationships. With rare exceptions in business graphs, these relationships always boil down to one or more of the seven relationships described on the following slides.

Page 36: Data visualisation

36

Time Series

Expresses the rise and fall of

values through time.– Use lines to emphasize

overall pattern.– Use bars to emphasize

individual values.– Use points connected by

lines to slightly emphasize individual values while still highlighting the overall pattern.

– Always place time on the horizontal axis.

Page 37: Data visualisation

37

Ranking

Expresses values in order by size.

Use bars only (horizontal or vertical).

– To highlight high values, sort in descending order.

– To highlight low values, sort in ascending order.

Page 38: Data visualisation

38

Part-to-Whole

Expresses the portion of each part relative to the whole.

– Use bars only (horizontal or vertical).

– Use stacked bars only when you must display measures of the whole

Page 39: Data visualisation

39

Deviation

Expresses how and the degree to which one or more things differ from another.

– Use lines to emphasize the overall pattern only when displaying deviation and timeseries relationships together.

– Use points connected by lines to slightly emphasize individual data points while also highlighting the overall pattern when displaying deviation and time-series relationships together.

– Use bars to emphasize individual values, but limit to vertical bars when a time series relationship is included.

– Always include a reference line to compare the measures of deviation against.

Page 40: Data visualisation

40

Distribution

Expresses a range of values as well as the shape of the distribution across that range.

Single distribution:– Use vertical bars to emphasize individual

values– Use lines to emphasize the overall shape.

Multiples distributions:– Use vertical or horizontal bars (a.k.a.

range bars or boxes) to encode the full range from the low value to the high value, or some meaningful portion of the range (for example, 90% of the values).

– Use points or lines together to encode measures of centre (for example, the median).

Page 41: Data visualisation

41

Correlation

Expresses how two paired sets of values vary in relation to one another.

– Use points and a trend line in the form of a scatter plot.

Page 42: Data visualisation

42

Nominal Comparison

Simply expresses the comparative sizes of multiple related but discrete values in no particular order.

– Use bars only (horizontal or vertical).

Page 43: Data visualisation

43

Agenda

Warm-Up

Data Visualisation: Why it matters

The Rules

Seven Common Quantitative Relationships

The best means to encode quantitative data in charts

Step by Step Guide

Test

5 mins

5 mins

10 mins

5 mins

5 mins

10 mins

5 mins

Page 44: Data visualisation

44

Four types of objects work best for encoding quantitative values in graphs: points, lines, bars, and boxes.

Points

Lines

Bars

Boxes

Page 45: Data visualisation

45

Points and Lines

Points are the smallest of the objects that are used to encode values in graphs. They can take the shape of dots, squares, triangles, Xs, dashes, and other simple objects. They have two primary strengths:

(1) they can be used to encode quantitative values along two quantitative scales simultaneously, as in a scatter plot, and

(2) they can be used to in place of bars when the quantitative scale does not begin at zero. Unlike lines, points emphasize individual values, rather than the shape of those values as they move up and down.

Lines connect the individual values in a series, emphasizing the shape of the data as it moves from value to value. As such, they are superb for showing the shape of data as it moves and changes through time. Trends, patterns, and exceptions stand out clearly.

You should only use lines to encode data along an interval scale.

Page 46: Data visualisation

46

0

20

40

60

80

100

120

140

160

Wales and West London And South East Scotland and North

Sales

Nominal Scale

Do not use lines for Nominal or Ordinal scales!

Wrong Wrong

In nominal and ordinal scales, the individual items are not related closely enough to be linked with lines, so you should use bars or points instead. Lines suggest change from one item to the next, but change isn’t happening if the items aren’t closely related as sequential subdivisions of a continuous range of values. For instance, it is appropriate to use lines to display change from one day to the next or from one price range to the next, but not from one community bank to the next.

Sales

0

20

40

60

80

100

120

Extra-Value Standard Branded Finest

Page 47: Data visualisation

47

Use lines only for Interval scales

0

20

40

60

80

100

120

Q1 Q2 Q3 Q4

Sales

Interval Scale

Right

With interval scales, you are not forced in all cases to use lines; you can use bars and points as well. If you want to emphasize the overall shape of the data or changes from one item to the next, lines work best.

If, however, you want to emphasize individual items, such as individual months, or to support discrete comparisons of multiple values at the same location along the interval scale, such as revenues and expenses for individual months, then bars or points work best.

0

20

40

60

80

100

120

Q1 Q2 Q3 Q4

Sales

Page 48: Data visualisation

48

Bars encode data in a way that emphasizes individual values powerfully

This ability is due in part to the fact that bars encode quantitative values in two ways:

(1) the 2-D position of the bar’s endpoint in relation to the quantitative scale, and

(2) the length of the bar.

You probably recognize that these two characteristics correspond precisely to the two visual attributes that can be used to encode data in graphs. When you want to draw focus to individual values or to support the comparison of individual values to one another (see figure 19), bars are an ideal choice. They don’t, however, do as well as lines in revealing the overall shape of the data. Bars may be oriented vertically or horizontally.

0

10

20

30

40

50

60

70

80

90

100

Rewards Exchange

Budget Actual

Page 49: Data visualisation

49

Whenever you use bars, your quantitative scale must include zero

The lengths of the bars encode their values, but won’t do so accurately if those values don’t begin at zero. Notice what happens when you narrow the quantitative scale and use bars below. Actual sales appear to be half of planned sales, but in fact they are 90% of the plan.

470

480

490

500

510

520

530

540

550

560

A B

When you would normally use bars, but wish to narrow the quantitative scale to show differences between the values in greater detail, you should switch from bars to points, because points encode values merely as 2-D location in relation to the quantitative scale, which eliminates the need to begin the scale at zero.

70

80

90

100

Rewards Exchange

Budget Actual

Page 50: Data visualisation

50

Boxes

Boxes are a lot like bars, except that both ends encode quantitative values. When bars are used in this way, they are sometimes called range bars. They are used to encode a range of values, usually from the highest to the lowest, rather than a single value.

In the 1970s John Tukey invented a method of using rectangles (bars with or without fill colors) in combination with individual data points (often a short line) and thin bars to encode several facts about a distribution of values, including the median (middle value), middle 50%, etc.

He called his invention a box plot (a.k.a. box-and-whisker plot).

Page 51: Data visualisation

51

Agenda

Warm-Up

Data Visualisation: Why it matters

The Rules

Seven Common Quantitative Relationships

The best means to encode quantitative data in charts

Step by Step Guide

Test

5 mins

5 mins

10 mins

5 mins

5 mins

10 mins

5 mins

Page 52: Data visualisation

52

Step 1: Determine your message

Determine your message.

Don’t just turn your data into a chart!

Think about what your data means, what you want to communicate and most importantly your audiences’ needs.

Will the data be used to look up and compare individual values, or will the data need to be precise? If so, you should display it in a table.

Is the message contained in the shape of the data—in trends, patterns, exceptions, or comparisons that involve more than a few values? If so, you should display it in a graph.

Or, do both.

Page 53: Data visualisation

53

Step 2: Determine the best means to encode the values

What am I trying to represent?

Nominal comparison. Bars (horizontal or vertical). Points (if the quantitative scale does not include zero).

Correlation. Points and a trend line in the form of a scatter plot

Time Series. Lines to emphasize the overall shape of the data

Bars to emphasize and support comparisons between individual values

Points connected by lines to slightly emphasize individual values while still highlighting the overall shape of the data

Ranking. Bars (horizontal or vertical). Points (if the quantitative scale does not include zero)

Part-to-Whole. Bars (horizontal or vertical) Note: Pie charts are commonly used to display part-to-whole relationships, but they don’t work nearly as well as bar graphs because it is much harder to compare the sizes of slices than the length of bars. Use stacked bars only when you must display measures of the whole as well as the parts

Deviation. Lines to emphasize the overall shape of the data (only when displaying deviation and time-series relationships together)

Points connected by lines to slightly emphasize individual data points while also highlighting the overall shape (only when displaying deviation and time-series relationships together)

Frequency Distribution. Bars (vertical only) to emphasize individual values. This kind of graph is called a histogram

Lines to emphasize the overall shape of the data. This kind of graph is called a frequency polygon.

Page 54: Data visualisation

54

Step 3: Determine where to display each variable – One Variable

Place the categorical variable on the x-axis if your graph will include ONE categorical variable and any one of the following is true:

• The categorical scale is an interval scale

• You are using lines to encode the data

• You are using bars to encode the data and the labels are not long or many

If you are using bars place the categorical variable on the Y-axis when either of these two conditions exist:

• The text labels associated with the bars are long

• There are many bars.

Is better than

0

20

40

60

80

100

120

Beef

Fresh

por

k

Lam

b

Bacon

Sausa

ge

Beef f

illet jn

t

Beef s

irloin

joint

Pork r

oulad

es

Fresh

por

k minc

e

Fresh

pou

ltry g

ravy

Beef s

tock

4 be

ef b

urge

rs

8 be

ef st

eak b

urge

rs

Angus

bur

gers

0 20 40 60 80 100 120

Beef

Fresh pork

Lamb

Bacon

Sausage

Beef fillet jnt

Beef sirloin joint

Pork roulades

Fresh pork mince

Fresh poultry gravy

Beef stock

4 beef burgers

8 beef steak burgers

Angus burgers

Page 55: Data visualisation

55

Step 3: Determine where to display each variable – Two or three variables

With a line graph, place the variable that is most important to your message along the X axis.

With a bar graph, encode the variable whose items you want to make it easiest to compare using a method other than association with an axis. Notice how much easier it is to compare appointments and sales than the regions, because they are positioned next to one another.

0

20

40

60

80

100

120

140

160

180

200

Wales and West London And South East

Scotland and North Midlands

Appointments

Sales

If the graph involves two or three variables, you must decide which to display along the axes and which to encode using distinct versions of another visual attribute, such as colour.

Page 56: Data visualisation

56

Step 3: Determine where to display each variable - the problem of the fourth variable

This solution involves a series of small graphs, arranged in the same way as a graph with three variables, all arranged together in a way that can be seen simultaneously. Each graph is alike, including consistent scales, differing only in that each features a different item of a categorical variable. Each graph varies according to a fourth variable, which is sales channel (e.g. product).

Using small multiples to support an additional variable is a powerful technique. Graphs can be arranged horizontally, vertically, or even in a matrix of columns and rows. If you need to display one more variable than you can fit into a single graph, select this approach.

0 50 100 150 200

Wales and West

London And South East

Scotland and North

Midlands

0 50 100 150 200 0 50 100 150 200

Sales

Appointments

Face-Value Rewards Big Exchange

2010

2011

Page 57: Data visualisation

57

Step 4: Determine the best design for the remaining objects - Scale

It’s now time to make a series of design decisions that remain, including the scales and text. These decisions are concerned with the placement and visual appearance of items.

If the graph will be used for analysis purposes that require seeing the differences between values in as much detail as possible, narrowing the scale can be useful. Generally, you should adjust the scale so that it extends a little below the lowest data value and a little above the highest.

If you are using bars to encode the data, but your message could be better communicated by narrowing the scale, Remember to switch from bars to points!

470

480

490

500

510

520

530

540

550

560

A B

800

1000

1200

1400

1600

1800

Q1 Q2 Q3 Q4

Sales

Page 58: Data visualisation

58

Step 4: Determine the best design for the remaining objects - Legend

If a Legend Is Required, and you are using lines, label the lines directly

If you are using bars, place the legend above the plot area with the labels arranged side-by-side in the same order as the bars

800

1000

1200

1400

1600

1800

Q1 Q2 Q3 Q4

Sales

London and South East

Wales and West

0

10

20

30

40

50

60

70

80

90

100

Rewards Exchange

Budget Actual

Page 59: Data visualisation

59

Step 4: Determine the best design for the remaining objects – Tick Marks and Scales

Tick marks are only necessary on quantitative scales, for they serve no real purpose on categorical scales. A number between 5 and 10 tick marks usually does the job; too many clutters the graph and too few fail to give the level of detail needed to interpret the values.

If the graph can be read with the scale in only one place (left, right, top bottom) place it nearest the data you want to emphasise or make easiest to read.

If the graph is so large it cannot be read with only one scale, place it in both positions ( top and bottom, left and right).

Page 60: Data visualisation

60

Step 4: Determine the best design for the remaining objects – Gridlines

Unless they are necessary to understand your message or divide a scatter plot into sections leave them off, and when used subdue them visually. Bear in mind graphs display patterns and relationships. If you want to communicate data with a high degree of quantitative accuracy use a table.

800

1000

1200

1400

1600

1800

Q1 Q2 Q3 Q4

Sales

Page 61: Data visualisation

800

1000

1200

1400

1600

1800

Q1 Q2 Q3 Q4

Sales

London and South East

Wales and West

61

Step 4: Determine the best design for the remaining objects – Descriptive Text

Although the primary message of a graph is carried in the picture it provides, text is always required to some degree to clarify the meaning of that picture. Some text if often needed, including:

– A descriptive title– Axis titles (unless the nature of the scale and its unit of measure are already clear)

Numbers in the form of text along quantitative scales are always necessary and legends often are. It is often useful to include one or more notes to describe what is going on in the graph, what ought to be examined in particular, or how to read the graph, whenever these bits of important information are not otherwise obvious.

Widget Sales by Region and Calendar Quarter (2007)

Widget sales in London and South East have been ahead of Wales and West with the exception of Q3

Page 62: Data visualisation

62

Step 5: Determine if particular data should be featured, and if so, how

The final major stage in the process involves highlighting particular data if some data is more important than the rest. Whatever the reason, you have a number of possible ways to make selected data stand out.

One of the best and simplest ways is to encode those items using bright or dark colours, which will stand out clearly if you’ve used soft colours for everything else. Other methods include:

–When bars are used, place borders only around those bars that should be highlighted.–When lines are used, make the lines that must stand out thicker.–When points are used, make the featured points larger or include fill colour in them alone.

0

20

40

60

80

100

120

Q1 Q2 Q3 Q4

Sales

800

1000

1200

1400

1600

1800

Q1 Q2 Q3 Q4

Sales

A B

Page 63: Data visualisation

63

Remember to follow this process for graph selection and design in order to communicate your information in the most e ective ff

manner

Determine your message and identify your data

Determine if a table, graph, or combination of both is needed to communicate your message

Determine the best means to encode the values

Determine where to display each variable

The best means to encode quantitative data in charts

Determine the best design for the remaining objects

Determine if particular data should be featured, and if so, how

Page 64: Data visualisation

Summary

Whenever you create a graph, you have a choice to make — to communicate or not. That’s what it all comes down to. If you have something important to say, then say it clearly and accurately. These guidelines are designed to help you do just that.

Page 65: Data visualisation

65

Agenda

Warm-Up

Data Visualisation: Why it matters

The Rules

Seven Common Quantitative Relationships

The best means to encode quantitative data in charts

Step by Step Guide

Test

5 mins

5 mins

10 mins

5 mins

5 mins

10 mins

5 mins

Page 66: Data visualisation

66

Which graph makes it easier to determine whether Mid-Cap US stocks or Small-Cap US stocks have a greater share?

A B

Page 67: Data visualisation

67

Which of these line graphs is easier to read?

A B

Page 68: Data visualisation

68

Which of these tables is easier to read?

A

B

Page 69: Data visualisation

69

Which graph makes it easier to focus on the pattern of change through time, instead of the individual values?

A

B

Page 70: Data visualisation

70

Only one of these graphs accurately encodes the values. The other skews the values in a misleading manner. Which graph presents the data accurately?

A B

Page 71: Data visualisation

71

Which map makes it easier to find all of the counties with positive growth rates?

A B

Page 72: Data visualisation

72

Which graph makes it easier to determine R&D’s travel expense?

A

B

Page 73: Data visualisation

73

In which graph are the labels easier to read?

A B

Page 74: Data visualisation

74

Which graph is easier to look at?

A

B

Page 75: Data visualisation

75

Which table allows you to see the areas of poor performance more quickly?

A

B

Page 76: Data visualisation

What percentage of the population is colour-blind?