designing graphs

Upload: ion-dobre

Post on 03-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 Designing Graphs

    1/34

    Data Visualization By: Taggert J. Brooks

    1

    Representing Data Graphically

    Data visualization, sometimes called information visualization - or infovis1 for short comes from the convergence of computer science, statistics and design. It is a marriagebetween science and art, between the left and right halves of the brain. The goal is to

    make data presentation interesting, aesthetically pleasing and hopefully informative.Good data visualization goes further by revealing relationships in the data that mightotherwise have gone unnoticed. With the absence of hypothesis tests it is easy todiscount visualization as unscientific, but that would be a mistake. There are many usesof data visualization, and the reality is hypothesis testing can bore the audience, if notcompletely surpass their level of understanding. Data visualization then is a means to anend for statisticians who want to be better communicators. And its a pathway to abetter understanding of the data for the designers amongst us.

    "In our excitement to produce what we could only make before with great effort, manyof us have lost sight of the real purpose of quantitative displays to provide the

    reader with important, meaningful, and useful insight." Stephen Few

    I would add that good visualization techniques will not only help the reader, but alsohelp the producer of the visualization to discover meaningful insights

    This document is meant to be an introduction to different visualization techniques, andthough I provide some practical how to, I do not provide everything. Where I fail,Google and the internet can fill in the gaps.

    Too Much Data

    The internet has led to an explosion in the amount of data we have collected, storedand easily accessible. It has done this through dramatically lowering the costs of thoseactivities. The problem we now face is filtering the valuable data from the invaluable dataand determining how we use it to inform business decisions or research. A recentexample of the ubiquity of new data can be taken from the presidential election. Wehave data on the frequency of word searches in Google by each minute of the VicePresidential debate between Senator Joe Biden and Governor Sarah Palin.2 Apparentlypeople were trying to figure out exactly what a Maverick actually is.

    What type of media will you use to make your presentation? How long does youraudience have to take in the data? The longer the audience has the more data dense the

    visualization can and should be. The less time and autonomy your audience has toperuse the data the more simplified the visualization should be.

    1 A wiki dedicated to Infovis: http://www.infovis-wiki.net/index.php?title=Main_Page2A graph of the searches can be found herehttp://www.readwriteweb.com/archives/google_has_changed_political_d.php

  • 7/28/2019 Designing Graphs

    2/34

    Data Visualization By: Taggert J. Brooks

    2

    Will it be a written report, a power point presentation, or is the data going to berendered on the web? In other words will the visualization be static or dynamic? Thesequestions are some of the first you should answer when selecting a visualizationmethod.

    Visualization is about Discovery, Discerning Patterns, and DisseminatingInformation.Below we have a nice info graphic describing the data collection to data usecontinuum.

    Here is a good example of the effectiveness of visualization for identifying outliers, ordata errors can be found below. This is derived from 3

    3http://www.visualizingeconomics.com/2009/07/12/data-scienist-data-geek-designer/

  • 7/28/2019 Designing Graphs

    3/34

    Data Visualization By: Taggert J. Brooks

    3

    The picture above is a great way of using visualization to identify errant data. Theunderlying data in this case must be no more than 100%, yet we can see one mistakenobservation.4

    Selecting the Right Graph

    Design is choice. The theory of the visual display of quantitative information consists ofprinciples that generate design options and that guide choices among options. Theprinciples should not be applied rigidly or in a peevish spirit; they are not logically ormathematically certain; and it is better to violate any principle than to place gracelessor inelegant marks on paper.

    Edward Tufte, The Visual Display of Quantitative Information

    Selecting the appropriate display can be difficult because it involves a goodunderstanding of the nature of your data, statistics, as well as a good understanding ofdesign principles. There are many possibilities for a given variable or dataset, but you

    need a place to start. There are a few web pages, which try to help, but none satisfyboth the issues of statistics and design simultaneously.5 As the quote by Tufte suggests,the choice of design does not easily fit into a simple algorithm.

    4 This is from the higher ed weblog http://blog.une.edu.au/robbi/2009/08/06/data-testing-using-visualisation/

    5This webpage http://interface.fh-potsdam.de/infodesignpatterns/news.php is closer to the visual endwhile this webpage http://www.ncsu.edu/labwrite/res/gh/gh-graphtype.html does a better job of helpingselect the appropriate graph from a statistics perceptive and this one helps choose the right statistical testhttp://www.ats.ucla.edu/stat/stata/whatstat/default.htm,.

  • 7/28/2019 Designing Graphs

    4/34

    Data Visualization By: Taggert J. Brooks

    4

    Some other examples of websites which try to provide guidance in the choice ofappropriate representations can be found in the blog entry titled Things should be madeas simple as possible, but not any simpler6, which is a famous Einstein quote.1. Determine the relationship you want to display2. Determine if you want to emphasize individual values or the overall pattern3. Determine the chart type

    Bad charts

    Before we begin discussing some of the common, and not so common visualizations itmight be better to provide some links to bad charts, and improvements. Stephen Fewprovides some excellent examples of bad charts and then provides recommendationsfor fixing the problems.7 Another set of examples is provided here.8

    Many of these criticisms and corrections are based upon the rules and suggestions fromthe work of Edward Tufte. His rules can be found at his website.9

    6http://blog.xlcubed.com/chart-rules-as-simple-as-possible-but-not-any-simpler/ A follow up can be foundhere as well. http://blog.xlcubed.com/household-income-distribution-1967-2005-as-small-multiples-chart/.Still another example of a chart chooser can be found here: http://chartchooser.juiceanalytics.com/, whichalso produces Excel templates from your choices7http://www.perceptualedge.com/examples.php8http://lilt.ilstu.edu/jpda/charts/bad_charts1.htm9http://www.washington.edu/computing/training/560/zz-tufte.html

  • 7/28/2019 Designing Graphs

    5/34

    Data Visualization By: Taggert J. Brooks

    5

    Seth Godin, the famed marketer has rules for making good graphs10.

    Graph TypesMicrosoft Excel is a common tool for creating graphic representations, but sadly theirdefault choices are often not good design choices. And many of the default graphs they

    provide should never be used. While Excel 2007 is much better than the horribledefaults in Excel 2003, they both can benefit from some alterations. For some details onaltering the charts after excel has created one using the default templates see the linkbelow.1112

    Some traditional graphical means of data representation, which can be found under theINSERT ribbon in Excel 2007:

    Pie chart

    The pie chart is useful for representing the relative proportions of a few categories. Themore categories, the greater the number of slices, the more difficult the chart is to

    read.

    The field of info visualization is rather new, and like any new field there are often veryimpassioned people in the field with starkly different opinions. For some their beliefs arealmost religious, and the rules they profess delivered with the same vigor as a BaptistMinister delivering a sermon from the pulpit. An example of this occurred in the

    blogophere when marketing guru Seth Godin suggested there should be no more barcharts, only pie charts. This led to a swift reply from the community of InfoVis folks,many of who countered with the exact opposite advice. Remember the quote fromTufte above, the reality is always somewhere in between, born of the exercise of good

    judgment.

    The problem with pie charts as infovis people will tell you - is that consumers ofvisualizations have a hard time estimating angles. In fact, they get them wrong, thusdrawing the wrong inference from the slices of a pie chart. People are better at visually

    judging height, which is why many infovis people prefer the column chart.13 The visualhierarchy of Cleveland is provided at this website.14

    10http://sethgodin.typepad.com/seths_blog/2009/07/how-to-make-graphs-that-work.html11 How to alter the defaults in Excel: http://blog.xlcubed.com/defaults-in-excel-charting/12http://www.juiceanalytics.com/writing/fixing-excel-charts/13http://seedmagazine.com/content/article/getting_past_the_pie_chart/14http://www.processtrends.com/TOC_data_visualization.htm

  • 7/28/2019 Designing Graphs

    6/34

    Data Visualization By: Taggert J. Brooks

    6

    15

    Bar and Column ChartsBar charts are often good for representing categorical data. You can present thefrequency of responses in each category, or the relative frequency.16 You can alsopresent the frequency or relative frequency of one variable, over the groups orcategories of another variable. Making it an excellent choice when you have two

    categorical variables.

    0

    50

    100

    1 2 3 4 5 6

    0 50 100

    1

    3

    5

    0 100 200

    1

    4

    Column chart Bar Chart Stacked Bar Chart

    Here is a recent bar chart I used to highlight US Debt to GDP ratio. Notice the use ofthe single red bar to draw attention to the US relative to the rest of the OECD. Imagine

    how ugly this would look, and how confusing if I used a different color for everycountry? How would this look if I used the same color for every country? Obviously thisworks in color, would it work in grayscale?

    15http://peltiertech.com/WordPress/pie-chart-for-pi-day/

    16 Most of the charts in this article were produced in Microsoft Excel 2007, unless otherwise noted. Theywere copied into Word 2007 using the pastepaste specialMicrosoft Excel object function.

  • 7/28/2019 Designing Graphs

    7/34

    Data Visualization By: Taggert J. Brooks

    7

    Line Graph

    The traditional line graph is generally used to measure a single variable (usuallycontinuous) over time, with time being represented on the horizontal axis. Though itcould be used to measure the relative frequency of a single response category over timeas well.

    0

    50

    100

    1 2 3 4 5 6 7 8 9 10

    0 20 40 60 80 100 120 140 160 180

    JapanGreece

    ItalyBelgiumPortugalHungary

    United Kingdom

    AustriaFrance

    NetherlandsPolandIceland

    United StatesTurkey

    GermanySweden

    SpainDenmark

    FinlandKorea

    CanadaIreland

    Czech RepublicSlovak Republic

    MexicoSwitzerland

    New Zealand

    NorwayLuxembourg

    Australia

    2008 Debt to GDP Ratio for OECD

  • 7/28/2019 Designing Graphs

    8/34

    Data Visualization By: Taggert J. Brooks

    8

    A few quick notes about the above graph. Ive removed the horizontal gridlines as they

    were an example of ink with no purpose. The background fill of the chart area has been

    changed to white. I added shaded bars to denote recessions. If I were to improve thisfurther, I would probably reduce the number of labels on the horizontal axis, say maybeevery 36 months, rather than 24. Id also probably also reduce the number of labels on

    the vertical axis as it currently feels a bit cluttered. Finally I might eliminate the titlealtogether and make a very small footnote that contained the same information. Or

    maybe just title the chart Employment and relegate the details to the footnote.

    Area Chart

    An area chart is a line chart with the area below the line shaded. This can be usefulwhen you have two lines over time and one line represents a subset of the first. For

    example, you could have retail sales over time broken into two categories, durable andnon-durable goods.

    0

    50

    100

    1 2 3 4 5 6 7 8 9 10

    0

    20

    40

    60

    80

    100

    120

    140

    160

    180

    200

    1 99 5 1 9 96 1 99 7 1 99 8 1 99 9 2 00 0 2 00 1 2 00 2 2 00 3 2 00 4

    Scatter Plot

    Scatter plots are useful when you have two continuous variables with one representedby the X axis and the other on the Y axis. A third variable can be used to measure

    another attribute of the points, yielding a bubble chart, which will be discussed later.

    107.0

    112.0

    117.0

    122.0

    127.0

    132.0

    137.0

    142.0

    Jan-90 Jan-92 Jan-94 Jan-96 Jan-98 Jan-00 Jan-02 Jan-04 Jan-06 Jan-08

    U.S. Payroll Employment: Total Nonagricultural: SA, Thousands ofPersons

  • 7/28/2019 Designing Graphs

    9/34

    Data Visualization By: Taggert J. Brooks

    9

    0

    50

    100

    0 50 100

    TablesWe should not always rush to make a chart, sometimes just presenting the numbers intabular form is sufficient to get your point across, or maybe you blend both? Below aretwo examples using the conditional formatting in Excel 2007, which blends the graphicdesign of a chart with the data in tabular form.17

    LeisureTimeSpentbiking 125

    hiking 40

    reading 30singing 25

    dancing 10

    cleaning 5

    LeisureTimeSpentbiking 125

    hiking 40

    reading 30singing 25

    dancing 10

    cleaning 5

    Whenever presenting data like this it is useful to rank order the data from largest tosmallest. Failure to do so makes it a bit harder for the reader to sift through the data asyou can see from the example below.

    LeisureTimeSpentbiking 125

    hiking 5

    reading 50

    singing 75

    dancing 10

    cleaning 80

    LeisureTimeSpentbiking 125

    hiking 5

    reading 50

    singing 75

    dancing 10

    cleaning 80

    A simple way to quickly deemphasize the numbers is to change the font of the numbersto white.

    Leisure Time Spentbiking 125hiking 40

    reading 30

    singing 25

    dancing 10

    cleaning 5

    17 In the Home Ribbon select conditional formattingdata bars

  • 7/28/2019 Designing Graphs

    10/34

    Data Visualization By: Taggert J. Brooks

    10

    The one very unfortunate issue with this technique is that Microsoft Excel violates animportant statistical and visualization principle with their bars. Zero values should berepresented by the absence of any color, bar or indicator. Yet, no matter how small thelowest quantity in the range of cells the bar appears to be about 5%, even if the value is

    zero, as can be seen in the example below.

    18

    Leisure TimeSpentbiking 125

    hiking 40

    reading 30

    singing 25

    dancing 10

    cleaning 0

    Spark Lines

    Sparklines are small inline line graphs developed by Edward Tufte19.

    GDP [5.8%]20

    GDP [5.8%]

    Notice how simple the sparkline is. We have removed the clutter of the Y and X axislabels. Yet the important information is still there, you see the relative values, clearly itis not currently at its highest value yet is higher than previous. Compare that to themore traditional graph below:

    0

    2

    4

    6

    8

    10

    1995 1996 1997 1998 1999 2000 2001 2002 2003 2004

    GDP

    18 Thanks to the excellent juice analytics for making this point.http://www.juiceanalytics.com/writing/excel-2007-and-lie-factor/19 Edward Tuftes explanation of the theory and practice of sparklineshttp://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0001OR&topic_id=120 The sparkline was created with the free open source add in for Microsoft Excel, called TinyGraphs. Itcan be found here: http://www.spreadsheetml.com/products.html.

  • 7/28/2019 Designing Graphs

    11/34

    Data Visualization By: Taggert J. Brooks

    11

    This representation clearly consumes more space, and invites the reader to linger onthe chart, rather than the point you are trying to make about the data. However, thistype of chart has its place. For example it might be a better representation if it isimportant for the reader to see that the highest value occurred in 1996, or that thelowest value was in 1995, or if you want them to easily see that GDP fluctuates between

    2% and 6%.

    It is important to note that sparklines can be more than just line charts. They can be barcharts, pie charts, etc. Sparklines merely refers to what Edward Tufte calls Intense,Simple, Word-Sized Graphics. Sparklines are obviously not well suited for power pointtype presentation graphics, but are well suited for written reports, or the currently invogue data dense business intelligence reports referred to as Dashboards.21

    Bullet Graph

    The Bullet graph, due to Stephen Few, is another piece of dashboard graph.

    22There is also a google gadget api for use in google docs that will produce this23.

    Spine Plots / Mosaic Plots / Matrix Charts

    These are best used for categorical data. Notice that we have added another dimensionto the data by making the width of the bar proportional to the fraction of cars in thatcategory (domestic versus foreign). Thus taking the traditional bar chart and addinganother level of data.

    Made with Statas ado file spineplot-. Jon Peltier has a solution for Excel which he callsa Matrix Chart 24. It is available in statistical language R as well.25

    21 For some examples see http://www.ozgrid.com/excel-add-ins/spark-maker-explained.htm22 The picture come from Stephen Fews Perceptual Edge herehttp://www.perceptualedge.com/blog/?p=37523http://dealerdiagnostics.com/blog/2008/09/the-ddr-bullet-graph-gadget/24http://pubs.logicalexpressions.com/Pub0009/LPMArticle.asp?ID=50825http://ideas.repec.org/a/tsj/stataj/v8y2008i1p105-121.html

  • 7/28/2019 Designing Graphs

    12/34

    Data Visualization By: Taggert J. Brooks

    12

    Heat mapsHeatmaps are 2 dimensional maps where the color intensity represents the underlyingdata. The above table on the right can be thought of as a heatmap. The darker orange

    colors represent larger values. When choosing the different colors to use, designers relyon color theory. Colorbrewer is a useful website to make sure that viewers can clearlydistinguish differences in your data.26

    Choropleth Maps (Color Maps)

    Choropleth maps are a specific type of heat map where the two dimensional object is ageographical map. The map is then painted with color based upon the intensity of theunderlying variable. Often darker colors represent larger values of the underlyingvariable. This is a great way to visually represent data that varies geographically. Theexample below was produced with Stata and comes from some foreclosure data I haveby county. The data represents the number of foreclosure filings as a percentage of

    housing units in each county for 2007 and the darker the shading of the county thehigher the rate of foreclosures filings in that county. Juneau County sticks out as theobvious county with the highest rate of foreclosures filings.

    A similar graph for the state of Wisconsin is below. Note that the shading has changedrelative to the previous graph and is now based upon different intervals.

    26 The website can be found here:http://www.personal.psu.edu/cab38/ColorBrewer/ColorBrewer_intro.html

  • 7/28/2019 Designing Graphs

    13/34

    Data Visualization By: Taggert J. Brooks

    13

    While I used a statistics program (Stata) to generate these graphics, there are manyopportunities for producing your own choropleths on the web. Google Documents hasadded their own visualization tools, which include the ability to create choropleths fordifferent countries.27 These maps and the presentation of this data geographicallyintersect with a rapidly growing field and use of Geographic Information Systems (GIS)in economic geography. Can you imagine the marketing uses for this type ofinformation?

    There are of course problems with these types of maps as well. They can mislead aviewer. The geographic area may be completely unrelated to the area at risk. For

    example, if the map represents foreclosure rates as these do you might think JuneauCounty represents a large economic problem for the region. However, the reality isthat the population of Juneau is quite small relative to La Crosse, and while theforeclosure rate might be high, the total number of foreclosures is still quite small,because there are fewer houses in that county relative to some of the other counties.The fundamental problem is that the graphic invites you to infer economic importancein proportion to geographic size, when this is not true. One solution is to distort thegeographic area based instead on the metric of interest.

    Cartograms (Distorted Maps)Another example of using colors and maps comes from the following distorted maps,

    where the distortion is based upon some underlying variable, in this case alcoholconsumption. Here the color only serves to demarcate the different countries. Ratherthan color intensity conveying the values of the underlying variable we the creators have

    27Details on producing these maps can be found herehttp://documents.google.com/support/spreadsheets/bin/answer.py?answer=91599And here http://googlesystem.blogspot.com/2008/02/data-visualization-google-gadgets.html

  • 7/28/2019 Designing Graphs

    14/34

    Data Visualization By: Taggert J. Brooks

    14

    distorted the size of the country proportionally to their alcohol consumption. There aresome people who feel cartograms hide more than they reveal.28

    Alcohol Consumption (2001)29

    Another example of a cartogram comes from the recent election.30Below is areinterpretation of the simplistic red/blue map you might have seen on TV or in thenewspaper. Now the colors are shaded based upon the vote, rather than simply onecolor for each party based upon the majority vote in that state. The states are alsodistorted by the number of votes cast in that state.

    Compare that to the traditional depiction:

    28http://flowingdata.com/2008/11/13/alternative-to-cartograms-using-transparency/29 The distorted maps presented here come from the following articlehttp://www.dailymail.co.uk/news/article-439315/How-world-really-shapes-up.html. Producing the distortedcartograms involves a substantial knowledge of programming, graph theory.30http://www-personal.umich.edu/~mejn/election/2008/

  • 7/28/2019 Designing Graphs

    15/34

    Data Visualization By: Taggert J. Brooks

    15

    Treemaps

    Tree Maps are another type of heat map, well suited for hierarchical data. The classicexample on the internet is the smartmoney.com map of the market31. Here thehierarchy from bottom up is as follows: start with individual stocks, they are group bycompany, which is represented by market capitalization (outstanding shares of thatcompany times share price). Higher market capitalization for the firm, means a largerarea for their box. This would be the initial box. Then companies are further groupedtogether into a larger box by industry. The small boxes are then colored based upon thepercentage gain or lost on the day, with green representing gains and red representinglosses. Visually it is very important to distinguish gains from losses by different colors.That was the major shortcoming with a recent NY Times32 heatmap.

    31 Smartmoneys map of the market is updated with a 15 minute delay. The site is here:http://www.smartmoney.com/map-of-the-market/32 The graphic concerns the performance of the economy under different Presidents and it can be seenhere http://www.nytimes.com/interactive/2008/10/18/business/20081019-metrics-graphic.html

  • 7/28/2019 Designing Graphs

    16/34

    Data Visualization By: Taggert J. Brooks

    16

    A recent bad day on Wall Street is captured by the following33.

    It is possible to produce tree maps of your own, whether through Microsoft Researchsexcel add-in34 or the use of IBMs web software ManyEyes.35 There are several examples

    33These data come from http://www.uie.com/brainsparks/2008/09/30/seeing-red-smartmoneycoms-map-of-the-market/34Microsoft provides an AddIn for Treemaps. http://www.gilsmethod.com/node/81

  • 7/28/2019 Designing Graphs

    17/34

    Data Visualization By: Taggert J. Brooks

    17

    of data you may have which could be represented by a treemap. Lets say you areworking on a project which is looking at students choice of major. The hierarchy fromtop down could be:

    CollegeMajornumber of students

    So the number of students determines the size of the box for each major. Then themajors are collected within the larger box of the college within which they are offered.The boxes could be colored by many different things, for example, lets say you weretrying to get a sense of how many students change their major and what the change itto. You could then color the boxes by the percentage of the people in that major whohave always had that major, or by the percentage that changed to that major within thelast year.

    Another example could be looking at the time students spend in different activities. Letssay you ask them the average number of hours per week they spend doing several

    things, such as studying, going to class, reading, writing, etc. Again it would be possiblefor you to break these down. You could make the first level of boxes equal in size tothe average percentage of time spent in the particular activity. The next level of boxeswould involve grouping the activities into broader areas, say academic, versus nonacademic. Basically any data that can be grouped through some sort of hierarchy willmake a good treemap.

    Some examples of brilliant dynamic web treemaps are provided by the New York Timesarticle on changes in inflation36. The New York Times also uses treemaps in a recentgraphic depicting the year of heavy losses on Wall Street37.

    Bump ChartsBump charts are a good way of showing changes in rank order. Below the The NewYork Times talks about the challenges which face the US and other countries on infantmortality.38 Where would you rather have an infant born? The US or Singapore?According to the chart Singapore. However, remember that this is measuring thenumber of deaths of infants (one year of age or younger) per 1000 live births. We aremore likely than other countries to have successful preterm births, but this group isvery much at risk for early death.

    35 The service is available herehttp://services.alphaworks.ibm.com/manyeyes/page/Treemap_for_Comparisons.html36 A look at recent inflationhttp://www.nytimes.com/interactive/2008/05/03/business/20080403_SPENDING_GRAPHIC.html?scp=1&sq=inflation%20chart&st=cse37http://www.nytimes.com/interactive/2008/09/15/business/20080916-treemap-graphic.html38http://www.nytimes.com/2009/04/07/health/07stat.html?ref=science

  • 7/28/2019 Designing Graphs

    18/34

    Data Visualization By: Taggert J. Brooks

    18

    Word CloudsWord clouds are good for representing responses to open ended questions39.

    This is from the following question:

    Looking ahead, which would you say is more likely - that in the country as a whole we'llhave continuous good times during the next five years or so, or that we will haveperiods of widespread unemployment or depression?

    A. Good times

    B. Widespread unemployment or depressionC. Other, please specify

    The word cloud is comprised of the responses to the C. Other, please specify answer, Ihave removed the first two.

    39 An easy to use web site http://wordle.net/ provides allows you to produce your own word clouds

  • 7/28/2019 Designing Graphs

    19/34

    Data Visualization By: Taggert J. Brooks

    19

    There are problems with this type of presentation. First, since the responses to theother answer were actually short phrases, we dont really capture the full phrase, butrather the frequency of the words. As a demonstration of this problem lets say 10people said good times and ten said bad times. Since the word times appears inboth, it will be the most frequent response (appearing 20 times) and therefore thelargest. But that doesnt tell us much about the sentiment being conveyed by therespondents.

  • 7/28/2019 Designing Graphs

    20/34

    Data Visualization By: Taggert J. Brooks

    20

    This is solved below by tying all the words of a single response together with the tilde(~). Joining the words with a ~ like this (joined~words), allows Wordle to produce a

    phrase cloud, which is a great way of visualizing responses to questions with 5 or socategories, where a phrase represents each category. This is very easy to do in excel,just highlight the column, do a find and replace where you put a blank space in the findand a ~ (tilde) in the replace. Then copy and paste the text into Wordle. Done.

    The other problem with this presentation is that it visually doesnt direct and steer theeye, while making the point. Your eye wanders all over the place.

    Using the question:

    When you think about the property taxes you or your landlord pay on the home inwhich you live and the services you receive for those taxes would you say property taxesin Wisconsin (or your state of residence) are much too high, somewhat too high, aboutright, somewhat too low or much too low?

    Answers that are joined area. Much too highb. Somewhat too highc. About rightd. Somewhat too lowe. Much too lowf. Other

  • 7/28/2019 Designing Graphs

    21/34

    Data Visualization By: Taggert J. Brooks

    21

    One could easily list the words by frequency from greatest to least, but word clouds arepopular because they are more than just data they are art. They invite the observer in,even if they get a little lost in the presentation. Sometimes efficiently conveyinginformation is sacrificed for the visual esthetic of good design. An example where theart matters more than some of the underlying data40

    40This graphic comes from the website http://www.pitchinteractive.com/election2008/. More artisticvisualizations can be found here: http://www.visualcomplexity.com/vc/ and Slate has an excellent collectionof artistic visualizations here http://www.slate.com/id/2197749/

  • 7/28/2019 Designing Graphs

    22/34

    Data Visualization By: Taggert J. Brooks

    22

    The edge of the doughnut lists the names of donors to the 2008 presidentialcampaigns. Clearly in this level of presentation you cannot read the names. However itstill gets some ideas across, like the disproportionate amount of funds raised by Obama,relative to McCain.

    Bubble ChartsBubble charts allow you to present 3 variables in two dimensions. They are basicallytraditional XY scatter plots, where the size of the bubble is proportional to a thirdvariable. In the case below the scatter plot represents the unemployment rate andforeclosure rate for each of the Wisconsin counties in the 7 rivers region, and the sizeof the bubble is proportional to the population of the county. It is a static presentationfor one year, 2007.

    JacksonJuneauLaCrosseMonroe

    Trempealeau Vernon

    0

    1

    2

    3

    4

    5

    6

    7

    8

    0 0.002 0.004 0.006 0.008 0.01

    UnemploymentRate

    ForeclosureRate

    7RiversRegion2007

    Another example, which highlights the problem with too many colors competing forattention can be found below. In example A the mind gets lost, whereas example B doesa good job of highlighting with context the data if the orange circle.41

    41http://charts.jorgecamoes.com/is-data-visualization-useful/

  • 7/28/2019 Designing Graphs

    23/34

    Data Visualization By: Taggert J. Brooks

    23

    Dynamic bubble charts allow you to plot the above, for different years, and then you canwatch the data change over the years. Ive produced some examples of the foreclosuredata to give you another idea for presenting the data42.

    One of the best examples of dynamic bubble charts can be found at Gapminder.43 How

    would you insert them into presentations? In the past I have posted them to a webpage,and rendered them separately, or within powerpoint. Obviously this type ofpresentation is not possible (currently) in a written report. I imagine that technology isnot far behind, as you could imagine Amazons kindle bridging the gap.

    These are beautiful graphic from the New York Times44, but they might be difficult foryou to re-create, though they should get you thinking how data can be presented sographically pleasing and at the same time informative.

    Presenting data in a written format requires different techniques than presenting thesame data orally. You have more time in a written piece for the user to dig into the

    data, the graph/chart can be more complex as the NYtimes pieces are.

    In the case of a power point, keep it simple and active. A science meets art, as in thecase of graphs and design. It is important to realize there will be differences. There isless likely to be an objective standard. Some arguments will be over design, and someover the content. Always ask yourself who your audience is, what the point of the graphis and if your design is in fact conveying what you want it to45. The following representssome important differences in preferences, but also important differences in terms ofinformation presented. Some other tips can be found at the links46

    Dynamic/Interactive Graphs

    These graphs can be dynamic in the sense that they are constantly updated and changingeither due to the influx of new data or interactive manipulations by the viewer.

    42http://www.uwlax.edu/faculty/brooks/prof/charts/foreclosure.htm andhttp://www.uwlax.edu/faculty/brooks/prof/charts/foreclosure-state.htm43http://googlegadgetsapi.blogspot.com/2008/06/spreadsheet-gadgets-free-dynamic-data.htmlhttp://code.google.com/apis/visualization/documentation/gadgetgallery.html44 Movies. http://www.nytimes.com/interactive/2008/02/23/movies/20080223_REVENUE_GRAPHIC.htmlNY Times on spending http://www.nytimes.com/interactive/2008/09/04/business/20080907-metrics-graphic.html Drug admtshttp://www.nytimes.com/2008/06/14/opinion/14blow.html?_r=3&oref=slogin&oref=slogin&oref=slogin45

    http://sethgodin.typepad.com/seths_blog/2008/07/the-three-laws.htmlhttp://sethgodin.typepad.com/seths_blog/2008/07/bar-graphs-vs-p.htmlhttp://peltiertech.com/WordPress/2008/07/12/bar-graphs-vs-pie-charts/http://www.perceptualedge.com/blog/?p=247http://blog.xlcubed.com/chart-rules-as-simple-as-possible-but-not-any-simpler/46http://www.macworld.com/article/134708/2008/07/chartsandgraphs.html?t=103http://www.giantflightlessbirds.com/workshops/better_graphs.pdfsome excel tips http://charts.jorgecamoes.com/category/how-to-and-tips/http://services.alphaworks.ibm.com/manyeyes/appand another linkhttp://www.decisionsciencenews.com/?p=475

  • 7/28/2019 Designing Graphs

    24/34

    Data Visualization By: Taggert J. Brooks

    24

    Data Visualization in Seminars/Talks/Presentations.When the audience is in front of you rather than at home in front of their computer,you are responsible for grabbing their attention and keeping them awake.

    Here is an example of the principle of simplicity in the presentation of data in a

    lecture/talk/seminar. The chart below contains three values: The percentage of water inthe body, the brain and the blood. Put yourself in the shoes of the audience if you sawthis chart. Interesting? Mind numbing?

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    body brain blood

    PercentWater

    Now what if I presented these same three pieces of data in three different power pointslides?

  • 7/28/2019 Designing Graphs

    25/34

    Data Visualization By: Taggert J. Brooks

    25

  • 7/28/2019 Designing Graphs

    26/34

    Data Visualization By: Taggert J. Brooks

    26

    We could present the boring bar chart. Its simple, easy to understand, but not visuallystimulating. It is more data dense than the three slides, yet I think you will agree thethree slides would have a bigger impact in a presentation. They engage the audiencevisually in a way the bar chart does not, giving the data a bigger impact. The slides camefrom the award winning presentation entitled Thirst47.

    Another must see slide presentation entitled Death by Power Point48 is available atslideshare.com. Garr Reynolds also provides a good section of his book on PresentationZen through his blog where he details the 4 principles of design: Contrast, Repetition,Alignment, and Proximity49.

    Contrast

    47Thirst won the 2008 award for the Worlds Best Presentation from Slideshare.comhttp://www.slideshare.net/jbrenman/thirst48 Slideshare has several good presentations on how to present. Death by PowerPointhttp://www.slideshare.net/thecroaker/death-by-powerpoint and Presenting With Texthttp://www.slideshare.net/girba/presenting-with-text49 Part of Chapter 6 can be downloaded here http://www.presentationzen.com/chapter6_pages.pdf

  • 7/28/2019 Designing Graphs

    27/34

    Data Visualization By: Taggert J. Brooks

    27

    Repetition

    Alignment and Proximity

  • 7/28/2019 Designing Graphs

    28/34

    Data Visualization By: Taggert J. Brooks

    28

    When thinking about PowerPoint design think about other technology. What do welove about Apple? Simple design. What do we love about Facebook? The design andinterface is much cleaner than most MySpace pages, though sadly that is changing50.Google, redefined simple and clean, and I am convinced that it helped fuel their early

    success. Did I mention I think simplicity is important? Avoid all of the visual crap thatMicrosoft seems to think is important.

    Good presentations are about more than just good slide design. They are also aboutbeing a good speaker and telling a good story. How do you learn this? Watch a fewgreat presentations. Pay attention to how they interact with the audience, how theyve

    50 See this article http://www.readwriteweb.com/archives/is_facebook_becoming_myspace.php

  • 7/28/2019 Designing Graphs

    29/34

    Data Visualization By: Taggert J. Brooks

    29

    organized their thoughts. A great presentation by Hans Rosling can be found in the linkbelow51. In fact most of the TED talks are useful examples of good succinctpresentations5253.

    Some general principles of slide design by Garr Reynolds at Presentation Zen can be

    found at the link

    54

    . He makes the important point that slides should have a high signal tonoise ratio55.

    Nancy Duarte of Duarte Design, responsible for designing some of the best TED talksand Al GoresAn Inconvenient Truth provides a wonderful webinar on usingpowerpoint56. Nancy also has an excellent book entitled Slide:ology.57

    A link to some insights on the presentations of Steve Jobs58.And please no bullet points59.

    51http://www.youtube.com/watch?v=hVimVzgtD6w52http://www.ted.com/53

    Additional notes on good presentation organization can be found here:http://www.extremepresentation.com/54http://www.presentationzen.com/presentationzen/2008/08/learning-from-the-design-around-you-ikea.html55http://www.presentationzen.com/presentationzen/2007/03/a_few_weeks_ago.html56http://www.vizthink.com/blog/2008/06/18/webinar-creating-powerful-presentations-with-nancy-duarte/57http://www.amazon.com/slide-ology-Science-Creating-Presentations/dp/0596522347/ref=pd_bbs_sr_1?ie=UTF8&s=books&qid=1238982954&sr=8-158http://images.businessweek.com/ss/09/09/0929_jobs_presentations/1.htm59http://aralbalkan.com/1286

  • 7/28/2019 Designing Graphs

    30/34

    Data Visualization By: Taggert J. Brooks

    30

    Finally, lest you think there is no fun in data visualization, here are some funny graphs60.

    Some Dos and donts

    I hate to give you a list of things to do and things not to do because as with any rules,

    there are times when they should be broken. However, by giving you some rules, youmight make sure and only break them when you have good reason to.

    Dont Do

    Use 3-D graphics in excel

    Use Microsoft clip art Use Pictures

    Use a powerpoint design template Use repetition in your design

    Read your presentation Practice/rehearse presentation

    Use bullet points Keep each slide to one idea

    References and Endnotes

    Some useful links to data visualization blogs and leading thinkers in the infoviz world.:

    http://junkcharts.typepad.com/http://www.visualcomplexity.com/vc/http://www.edwardtufte.com/tufte/http://www.perceptualedge.com/http://infoclarity.blogspot.com/http://eagereyes.org/http://charts.jorgecamoes.com/http://visualizeit.wordpress.com/

    http://www.visualizingeconomics.comhttp://www.juiceanalytics.com/writing/

    Presentation Related Blogs

    http://blog.duarte.com/http://www.presentationzen.com/presentationzen/

    60http://graphjam.com/

  • 7/28/2019 Designing Graphs

    31/34

  • 7/28/2019 Designing Graphs

    32/34

    Data Visualization By: Taggert J. Brooks

    32

    Appendix: TIPS for Excel 2007

    How to change the axis of a chart to the logarithmic scale.

    From http://office.microsoft.com/en-us/excel/HP030656791033.aspx

    Make changes to the scales of value axes

    1. On a chart sheet or in an embedded chart, click the value (y) axis that you want tochange.

    2. On the Format menu, clickSelected Axis.3. On the Scale tab, do one of the following: To change the number at which the value (y) axis starts and ends, type a

    different number in the Minimum box or the Maximum box.

    To change the interval of tick marks and gridlines, type a different number inthe Major unit box or Minor unit box.

    To change the units displayed on the value (y) axis, click the units that youwant or type a numeric value in the Display units list.

    To show a label that describes the units expressed, select the Show display

    units label on chart check box.

    Tip If your chart values consist of large numbers, you can make the axis text

    shorter and more readable by changing the display unit of the axis. For

    example, if the chart values range from 1,000,000 to 50,000,000, you can

    display the numbers as 1 to 50 on the axis and show a label that indicates that

    the units express millions.

    To change the value (y) axis to logarithmic, select the Logarithmicscale check box.

    To reverse values so that you can flip bars or columns or other data markers,select the Values in reverse ordercheck box.

  • 7/28/2019 Designing Graphs

    33/34

    Data Visualization By: Taggert J. Brooks

    33

    How to use the Histogram add-in in Excel

    http://support.microsoft.com/kb/214269

    SUMMARY

    This step-by-step article describes how to create a histogram with a chart from a sample set of data. The Analysis ToolPak that is included with Microsoft Excel includes a Histogram tool.

    Back to the top

    Verify Installation of the Analysis ToolPak

    Before you use the Histogram tool, you need to make sure the Analysis ToolPak Add-in is installed. To verify whether the Analysis ToolPak is installed, follow these steps:

    1. In Microsoft Office Excel 2003 and in earlier versions of Excel, clickAdd-Ins on the Tools menu.

    In Microsoft Office Excel 2007, follow these steps:

    a. Click the Microsoft Office Button, and then clickExcel Options.

    b. Click the Add-Ins category.

    c. In the Manage list, select Excel Add-ins, and then clickGo.

    2. In the Add-Ins dialog box, make sure that the Analysis ToolPakcheck box under Add-Ins available is selected.ClickOK.

    NOTE: In order for the Analysis ToolPak to be shown in the Add-Ins dialog box, it must be installed on your computer. If you do not see Analysis ToolPakin the Add-Ins dialog box, run Microsoft ExcelSetup and add this component to the list of installed items.

    Back to the top

    Create a Histogram

    1. Type the following in a new worksheet:

    A1: 87 B1: 20

    A2: 27 B2: 40

    A3: 45 B3: 60

    A4: 62 B4: 80

    A5: 3 B5:

    A6: 52 B6:

    A7: 20 B7:

    A8: 43 B8:

    A9: 74 B9:

    A10: 61 B10:

    2. In Excel 2003 and in earlier versions of Excel, clickData Analysis on the Tools menu.

    In Excel 2007, clickData Analysis in the Analysis group on the the Data tab.

    3. In the Data Analysis dialog box, clickHistogram, and then clickOK.

    4. In the Input Range box, type A1:A10.

    5. In the Bin Range box, type B1:B4.

    6. Under Output Options, clickNew Workbook, select the Chart Output check box, and then clickOK.A new workbook with a Histo gram table and an embedded chart is generated.

  • 7/28/2019 Designing Graphs

    34/34

    Data Visualization By: Taggert J. Brooks

    Based on the sample data from step 1, the Histogram table will look like the following table:

    A1: Bin B1: Frequency

    A2: 20 B2: 2

    A3: 40 B3: 1

    A4: 60 B4: 3

    A5: 80 B5: 3

    A6: More B6: 1

    And, your chart will be a column chart that reflects the data in this Histogram table.

    Excel counts the number of data points in each data bin. A data point is included in a particular data bin if the number is greater than the lowest bound and equal to or less than the greater bound for the data bin.In the example here, the bin that corresponds to data values from 0 to 20 contains two data points, 3 and 20.

    If you omit the bin range, Excel creates a set of evenly distributed bins between the data's minimum and maximum values.

    NOTE: You will not be able to create the Histogram chart if you specify the options ( Output range or New worksheet ply) that create the Histogram table in the same workbook as your data.