impact of data variation on efficacy of cartograms and ... · data variation on graphical...

7
Impact of Data Variation on Efficacy of Cartograms and Choropleth Maps Juliette Love Stanford University ABSTRACT The scientific foundation for graphical construction is well- established. Many studies are available that can guide visualization designers as to which graphical choices will lead to greatest visu- alization efficacy. However, the field of value-by-area cartograms, and indeed all geographically-binned data visualizations, is lacking in a basic foundation for the efficacy of such designs. The problem of cartogram efficacy has been tackled only in perceptual studies that contrast a cartogram with an alternative projection of a singular dataset and use the results to make declarative claims about the effec- tiveness of cartograms, or more commonly, the lack thereof; none of these previous studies account for the character of the dataset to be visualized. This paper outlines the rationalization behind value-by- area cartogram construction, puts forth a theory about the effect of data variation on graphical perception of cartograms and choropleth maps, and validates this theory based on a user study. The results of the study provide a guideline, based in data variation, for the use of non-contiguous cartograms and choropleth maps. 1 I NTRODUCTION 1.1 Cartograms Value-by-area cartograms are a technique for visualizing geographically-binned datasets. These bins can be counties, states, countries, or any other bounded geographical areas; a cartogram represents a quantitative field across a geographic domain by sizing the area of each bin proportional to its corresponding value in the dataset. Cartograms can be used for both univariate and bivariate datasets, discounting the geographic variable. A common example of a univariate cartogram is a population cartogram, where each geographic bin is sized by its population. A traditional map is essen- tially a cartogram where the dataset contains the land area of each bin. Cartograms can also display multivariate datasets by using addi- tional encoding types, such as color. They are often used in cases when an understanding of the first metric, encoded in area, is re- quired for an understanding of the second. A common example of such a visualization is the US presidential election map. The first metric, the number of electoral votes for each state, is important in understanding the outcome of the election. Because the num- ber of electoral votes is not area-dependent, simply coloring each state based on candidate choice would not provide any information regarding the outcome of the election. However, sizing the states by electoral votes and then coloring each state by candidate choice leaves the viewer with a simple comparison of the areas encom- passed by the two colors in order to determine the outcome of the election. There are a few additional benefits of cartograms other than comparing a metric across the entire mapped area. Because the viewer is familiar with the original size of the bins in most car- tograms, cartograms allow the viewer to compare the newly-sized area with the original area of the shape. Thus, the viewer can deter- mine whether the metric is large or small in a given bin compared Figure 1: A choropleth map and a cartogram with 2016 election results (Newman 2016) [10] to the geographical area, as well as see regional patterns in the data based on the change in area size. 1.2 Perception In a landmark visualization study, Cleveland & McGill (1984) de- scribed graphical perception as a means of measuring visualization effectiveness. Their goal was to establish a “scientific foundation” for the design of graphical data representations [1]. Based on this re- search, Mackinlay (1986) created an ordering of different graphical encodings (such as position, area, and color) based on the relative perceptual efficacy of those encodings [9]. The study took into ac- count features of the dataset (such as the data type), and adjusted the rankings for each of these features. However, maps are a special case of visualization design that cannot be considered in the same way as graphical representations. The viewers’ understanding of the data is influenced by their level of familiarity with the basemap, particularly in the case of value-by- area cartograms. As a result, the ranking of encodings outlined by Mackinlay does not hold true for geographic visualizations. Accord- ing to Mackinlay’s rankings, area is a more perceptually accurate encoding method than color or saturation for quantitative datasets; however, multiple studies such as Sun & Li (2010) have shown that for at least some datasets, choropleths (which encode in color or saturation) are more perceptually effective than cartograms (which encode in area) [14]. Thus, there is a need for a similar percep- tual foundation for cartogram and choropleth design that reconciles these approaches. This paper aims to provide a part of this foun- dation by tackling the relative perceptual efficacies of cartograms and choropleth maps based on the variation in the visualized dataset. Specifically, it attempts to quantify how much data variation is re- quired for the benefit of cartograms of encoding in size to outweigh the cost of map distortion to achieve higher perceptual accuracy. 2 RELEVANT WORK The comparative efficacy of various thematic maps was first stud- ied by Dunn (1987, 1988), who compared choropleth maps and framed rectangle maps using state-level murder data [3, 4]. This study was conducted using both univariate and multivariate versions of the murder rate dataset, and concluded that framed rectangle plots produced more accurate comprehension by viewers. Although Griffin (1983) [7] first discussed the negative effects of distortion on value-by-area cartograms, the comparative perceptual accuracy

Upload: others

Post on 21-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Impact of Data Variation on Efficacy of Cartograms and ... · data variation on graphical perception of cartograms and choropleth maps, and validates this theory based on a user

Impact of Data Variation on Efficacy ofCartograms and Choropleth Maps

Juliette LoveStanford University

ABSTRACT

The scientific foundation for graphical construction is well-established. Many studies are available that can guide visualizationdesigners as to which graphical choices will lead to greatest visu-alization efficacy. However, the field of value-by-area cartograms,and indeed all geographically-binned data visualizations, is lackingin a basic foundation for the efficacy of such designs. The problemof cartogram efficacy has been tackled only in perceptual studiesthat contrast a cartogram with an alternative projection of a singulardataset and use the results to make declarative claims about the effec-tiveness of cartograms, or more commonly, the lack thereof; none ofthese previous studies account for the character of the dataset to bevisualized. This paper outlines the rationalization behind value-by-area cartogram construction, puts forth a theory about the effect ofdata variation on graphical perception of cartograms and choroplethmaps, and validates this theory based on a user study. The results ofthe study provide a guideline, based in data variation, for the use ofnon-contiguous cartograms and choropleth maps.

1 INTRODUCTION

1.1 CartogramsValue-by-area cartograms are a technique for visualizinggeographically-binned datasets. These bins can be counties, states,countries, or any other bounded geographical areas; a cartogramrepresents a quantitative field across a geographic domain by sizingthe area of each bin proportional to its corresponding value in thedataset. Cartograms can be used for both univariate and bivariatedatasets, discounting the geographic variable. A common exampleof a univariate cartogram is a population cartogram, where eachgeographic bin is sized by its population. A traditional map is essen-tially a cartogram where the dataset contains the land area of eachbin.

Cartograms can also display multivariate datasets by using addi-tional encoding types, such as color. They are often used in caseswhen an understanding of the first metric, encoded in area, is re-quired for an understanding of the second. A common example ofsuch a visualization is the US presidential election map. The firstmetric, the number of electoral votes for each state, is importantin understanding the outcome of the election. Because the num-ber of electoral votes is not area-dependent, simply coloring eachstate based on candidate choice would not provide any informationregarding the outcome of the election. However, sizing the statesby electoral votes and then coloring each state by candidate choiceleaves the viewer with a simple comparison of the areas encom-passed by the two colors in order to determine the outcome of theelection. There are a few additional benefits of cartograms otherthan comparing a metric across the entire mapped area. Becausethe viewer is familiar with the original size of the bins in most car-tograms, cartograms allow the viewer to compare the newly-sizedarea with the original area of the shape. Thus, the viewer can deter-mine whether the metric is large or small in a given bin compared

Figure 1: A choropleth map and a cartogram with 2016 electionresults (Newman 2016) [10]

to the geographical area, as well as see regional patterns in the databased on the change in area size.

1.2 PerceptionIn a landmark visualization study, Cleveland & McGill (1984) de-scribed graphical perception as a means of measuring visualizationeffectiveness. Their goal was to establish a “scientific foundation”for the design of graphical data representations [1]. Based on this re-search, Mackinlay (1986) created an ordering of different graphicalencodings (such as position, area, and color) based on the relativeperceptual efficacy of those encodings [9]. The study took into ac-count features of the dataset (such as the data type), and adjusted therankings for each of these features.

However, maps are a special case of visualization design thatcannot be considered in the same way as graphical representations.The viewers’ understanding of the data is influenced by their levelof familiarity with the basemap, particularly in the case of value-by-area cartograms. As a result, the ranking of encodings outlined byMackinlay does not hold true for geographic visualizations. Accord-ing to Mackinlay’s rankings, area is a more perceptually accurateencoding method than color or saturation for quantitative datasets;however, multiple studies such as Sun & Li (2010) have shown thatfor at least some datasets, choropleths (which encode in color orsaturation) are more perceptually effective than cartograms (whichencode in area) [14]. Thus, there is a need for a similar percep-tual foundation for cartogram and choropleth design that reconcilesthese approaches. This paper aims to provide a part of this foun-dation by tackling the relative perceptual efficacies of cartogramsand choropleth maps based on the variation in the visualized dataset.Specifically, it attempts to quantify how much data variation is re-quired for the benefit of cartograms of encoding in size to outweighthe cost of map distortion to achieve higher perceptual accuracy.

2 RELEVANT WORK

The comparative efficacy of various thematic maps was first stud-ied by Dunn (1987, 1988), who compared choropleth maps andframed rectangle maps using state-level murder data [3, 4]. Thisstudy was conducted using both univariate and multivariate versionsof the murder rate dataset, and concluded that framed rectangleplots produced more accurate comprehension by viewers. AlthoughGriffin (1983) [7] first discussed the negative effects of distortionon value-by-area cartograms, the comparative perceptual accuracy

Page 2: Impact of Data Variation on Efficacy of Cartograms and ... · data variation on graphical perception of cartograms and choropleth maps, and validates this theory based on a user

of cartograms and other thematic maps was not understood in anydepth until Tobler (2004) [15] provided a qualitative analysis of thebenefits and costs of using cartograms.

Since then, multiple papers have dealt with the perceptual diffi-culties of cartograms and attempted to provide alternative solutions,such as Stewart & Kennelly (2010), which presents a 3D, raisedmodel which combines the principles of a cartogram, a choroplethmap, and a framed rectangle map by encoding the first metric involume instead of area, and the second metric in color [13]. Anotheralternative solution is the value-by-alpha map as described by Rothet al. (2010), which displays bivariate data by using two color axes(one for the A/B metric such as election result, encoded in hue, andone for population or some other normalization, encoded in alphavalue) [12]. The two color-axis method (such as value-by-alpha) iscommonly used to visualize the type of multivariate dataset used inthis study, so the choropleth maps for this study are generated usingthe Roth et al. method.

Perhaps the most relevant study in this area is Sun & Li(2010) [14], which aimed to compare the perceptual effective-ness of cartograms and thematic maps. The study included mul-tiple types of cartograms—Dorling cartograms [2], non-contiguousshape-preserving cartograms, contiguous (Gastner-Newman) [6]cartograms, and rectangularly-binned pseudo-cartograms, and com-pared these projections to various equivalent thematic maps: propor-tional symbol maps, choropleth maps, and pseudo-thematic maps.The relevant experiment within the study analyzed these visual-izations using a dataset of the state-level results of the 1996 USpresidential election. Sun & Li found that for this dataset, the par-ticipants understood the dataset more accurately when viewing thecartogram than they did when viewing the corresponding thematicmap. To account for the decrease in geographic recognition dueto area distortion that accompanies datasets with greater variationthan their test dataset, they suggested an algorithmic compromisebetween accuracy and recognizability. This study challenges theidea that such a compromise is necessary. By expanding beyond asingle dataset of arbitrary variation, it is possible to understand howvariation influences perception of thematic maps and cartograms.This study employs similar survey methods to Sun & Li.

3 THEORY

Both cartograms and choropleth maps encode one variable in hue,but they differ in that cartograms encode the second variable witharea whereas choropleths encode the second variable in some metricrelated to lightness or saturation. The encoding type for the secondvariable decreases the perceptual accuracy of both choropleth mapsand cartograms, but for different reasons.

3.1 Choropleth Maps

It is easiest to understand these different projections through theexample of election maps described above. For both choroplethmaps and cartograms, the candidate choice is encoded in color (eitherred or blue). For choropleth maps, the number of electoral votes perstate is encoded in saturation, brightness or alpha value. The rankingof encodings proposed by Mackinlay (1986) dictates that encodingin saturation (or a related metric) leads to poor perceptual accuracy[9]. If the second variable has relatively little variation across thegeographic area, that variable has little effect on comprehension;as the variation increases, factoring the second variable into visualanalysis becomes crucial for coming to a correct judgement aboutthe underlying data. In map A, the second variable has relatively lowvariance, so a viewer can still come to a reasonable understandingof the dataset across the geographic domain without including thevarying alpha values. However, in map B, which has a relativelyhigh variance for the second variable, the changes in alpha value aremore important to understanding the underlying data.

Figure 2: 2008 election results data as a value-by-alpha map (Roth2010)

Figure 3: Map A (low-variance) Map B (high-variance)

3.2 CartogramsCartograms are created by sizing the geographic bins (e.g. states)based on the second metric (electoral votes in the above example).More precisely, for a metric y across each shape i with area ai, acartogram is created to have

ai = ky(i)|i (1)

However, there are many ways to achieve this effect. A cartogramcan preserve borders between geographic areas but distort the shapesof those areas (contiguous cartograms), they can preserve shape, butnot borders (shape-preserving non-contiguous cartogram), or theycan preserve neither shape nor boundaries (Dorling cartogram orrectangular cartogram). The relative efficacy of these types of car-tograms can be judged in respect to the goals of cartogram creation.Cartograms (or any type of map projection) are used to display geo-graphical datasets so the viewer can identify geographically-basedpatterns in the dataset; otherwise, if the geography does not inform

Figure 4: Contiguous cartogram, shape-preserving non-contiguouscartogram, Dorling cartogram

Page 3: Impact of Data Variation on Efficacy of Cartograms and ... · data variation on graphical perception of cartograms and choropleth maps, and validates this theory based on a user

Figure 5: Low-variance and high-variance choropleths and car-tograms

the understanding of the data, the data would be more effectivelyvisualized in a graph or other non-geographically-based chart. Con-tiguous cartograms provide an additional perceptual cost on theviewer due to the distortion of state shapes in addition to the resizingof the areas, and this distortion can cause the viewer to lose senseof the geography the map (Yau 2008) [16]. This negates the bene-fits of displaying the data geographically, causing many to dismisscontiguous cartograms as merely an artistic representation withoutmuch analytical value.

Dorling cartograms have a higher perceptual cost than shape-preserving cartograms because although both projections have asimilar layout, Dorling cartograms do not include the additional geo-graphical grounding that preserving the state shapes provides. Thus,in order to reduce shape distortion and retain the ability to understandgeographic trends, this study uses shape-preserving non-contiguouscartograms. However, regardless of cartogram type, perceptual ac-curacy is at least partially reduced due to the discrepancy betweenthe viewer’s preconceived idea of the map, which stems from theirfamiliarity with the geography, and the new map once it has beendistorted.

3.3 Comparing Cartograms and ChoroplethsDetermining which map is more perceptually accurate is depen-dent on which of the two projection types has greater perceptualdrawbacks—whether or not the area distortion caused by cartogramsis more difficult for a viewer to overcome than the difficulties of un-derstanding choropleths due to encoding data in saturation. However,these costs will not be the same across all possible datasets. Sincethe difference between the two projections comes in the encodingfor the second (quantitative) variable, the variation in the visualizeddataset would affect the perceptual cost: with greater data variation,the importance of this variable increases. For the cartogram andchoropleth below with low variance (A), the second variable is lessimportant in determining the relationship k for colors ri and bi (0,1)and area ai than it is for the maps below with high variance (B),where

∑i

airi = k∑i

aibi (2)

As a result, the single-dataset approach that most studies in thisarea take is problematic. This study attempts to provide a morethorough analysis of which projection is more effective.

4 METHODS

In order to determine which visualization design was more effectivefor which levels of data variation, an online survey was conducted.The subjects were 100 individuals with varying degrees of famil-iarity with cartograms and choropleth maps. The participants were

Figure 6: Sample choropleth map; sample cartogram.

unaware that the experiment was designed to measure the effect ofdataset variation on perception, and the survey provided no indica-tion of the relative variation of any of the maps. There was no timelimit on the study.

Participants were shown six cartograms and six choropleth mapsof the contiguous United States with different levels of data variation,in randomized order. In addition to the geography, these maps dis-played one quantitative variable, encoded in alpha value (in the caseof the choropleth map) or state area (in the case of the cartogram),and each state was colored either red or blue. The participants wereasked to classify the cartograms and choropleth maps of differentlevels of variation as having ‘more red’ or ‘more blue.’ The colorsred and blue were chosen to increase the visual similarity of thestudy maps to election maps, as people are already familiar withclassifying a dataset of that type. Fig. 6 shows a sample of the twotypes of maps.

To minimize unintended perceptual differences between the mapsas much as possible, a few controls were put in place. Firstly,although the datasets were different for each level of variation, thedifference in total red and total blue was constant. For state i withcolors ri and bi (0, 1), area ai and alpha value αi:

∑i

aiαiri−∑i

aiαibi = c1 (3)

In the case of cartograms, the α value remains the same across all i,for choropleth maps, a is the true geographic area of the bin.

Additionally, as seen in Fig. 6, for a given level of variance, thedataset visualized in the cartogram and the choropleth was the same—the same states were colored red and blue, and the alpha value ofeach state in the choropleth was proportional to the correspondingarea value in the cartogram, such that, for variation function V ,cartogram i(ca) and choropleth i(ch):

∀i |Vi(ca) =Vi(ch),ai(ca) = c1αi(ca) (4)

Finally, the lightness Lr = Lb for the red and blue base colors, soneither color stands out more to the participants.

The maps displayed datasets with six levels of data variationat increments such that, for the dataset of size n with mean x̄ and0≤ k < 6:

− log

[1nx̄

n

∑i=1|xi− x̄|

]= 1.056+1.38k (5)

The datasets were generated randomly such that they satisfied thecontrol conditions, and then adjusted randomly with constraints toapproach the variance condition.

One important question in designing this study was how to defineperceptual efficacy. Different perceptual studies use different met-rics and methods to determine this, including judging participants’ability to compare a metric between two geographic bins, abilityto estimate the value of the metric in a single bin (MacEachren1982) [8], and retention of the data after seeing the map (Rittschof &

Page 4: Impact of Data Variation on Efficacy of Cartograms and ... · data variation on graphical perception of cartograms and choropleth maps, and validates this theory based on a user

Figure 7: Accuracy of participants’ classifications of cartograms andchoropleth maps at different levels of data variation, measured bymean absolute deviation

Figure 8: Accuracy of participants’ classifications of cartograms andchoropleth maps at different levels of data variation, measured byvariance

Kulhavy 1998) [11]. However, this study uses comparative accuracyover a single binary variable (encoded in red/blue), since it is asimpler classification task as well as one that is often the goal ofsuch visualizations, including as election maps.

5 RESULTS

Fig. 7 shows the average accuracy of the participants’ understandingof both choropleth maps and non-contiguous cartograms. On themost basic level, it is clear that choropleths become less effectiveas variation increases, but the efficacy of cartograms remains ap-proximately constant. Logarithmic regression (cartogram R2 = 0.96,choropleth R2 = 0.89) reveals that cartograms are the more effectivevisualization if:

1nx̄

n

∑i=1|xi− x̄|> 0.3 (6)

or that the mean absolute deviation normalized by the mean > 0.3.Calculating the variance of each visualized dataset yields results

with a similar shape (Fig. 8). In this case, cartograms become moreeffective if:

1nx̄

n

∑i=1

(xi− x̄)2 > 0.19 (7)

Table 1: Accuracy of 100 participants at classifying choropleth mapsand cartograms at six levels of variation

MAD/µ Choropleth Accuracy Cartogram Accuracy

0.018 0.93 0.680.025 0.7374 0.610.034 0.59 0.650.047 0.59 0.710.064 0.32 0.710.088 0.14 0.58

Table 2: Number of correct classifications out of six possible byeach participant, choropleth maps and cartograms

Correct Choropleth CartogramClassifications

0 0 21 2 72 11 193 49 224 32 295 4 176 2 4

The average error across all 6 datasets was, for sample j andparticipant i

6

∑j=1

√1x̄

n

∑i=i

(xi− x̄)2 = 0.0446 (8)

The distribution of individual accuracy was slightly left-skewedbut approximately normal across all 12 maps seen (Fig. 10). Onaverage, participants correctly classified 68% of the presented visu-alizations, or slightly more than 8 of the 12 shown. The standarddeviation for the combined accuracy distribution is 0.1738.

Deconstructing the accuracy distribution into cartograms andchoropleth maps reveals a difference between the participants’ per-ceptual accuracy of the two projections. The accuracy in judgingcartograms varied widely across all participants, but the accuracyin judging choropleths was fairly consistent (Fig. 11). The stan-dard deviation of the cartogram individual accuracy distribution was0.1512, with 54% correctly classifying either 6 or 7 of the 12 maps;for choropleth maps, the standard deviation was 0.203, with morethan 80% of the participants correctly classifying 6 or 7 maps.

One possible reason for this difference is the viewer’s familiaritywith the visualization method. If the viewer has more familiaritywith cartograms, there may be a lower perceptual cost of the areadistortion; by contrast, due to the ubiquity of choropleth maps, theparticipants will have a more equal understanding of how to interpretthem.

6 DISCUSSION

Qualitatively, it is understandable that choropleth maps becomeless effective than cartograms at greater data variation given theframework outlined in the previous sections. As the variation in thedata increases, it becomes more difficult to understand the datasetwhen visualized by the choropleth map because the viewers mustrely more heavily on their perception of the saturation-based metric.The perceptual accuracy does not decrease for the cartogram, how-ever, since the viewer uses the area-based metric with equal weightregardless of the variation in the data, and the perceptual cost ofthe non-contiguous cartogram does not significantly increase as thecartogram becomes increasingly distorted.

The fact that the perceptual accuracy of the cartograms did notdecrease significantly at higher levels of data variation suggests that

Page 5: Impact of Data Variation on Efficacy of Cartograms and ... · data variation on graphical perception of cartograms and choropleth maps, and validates this theory based on a user

Figure 9: Accuracy including error, calculated as1√n

∑ni=i(xi− x̄)2

Figure 10: Number of participants by percentage of all maps classi-fied correctly

once the geographic areas had been resized slightly at low levelsof variation, increasing the level of distortion in the map did notdecrease perceptual accuracy.

6.1 Evaluating existing map projectionsSun & Li (2010) found that cartograms were a generally moreeffective means for visualizing the 2008 United States presidentialelection map. According to the condition outlined above, cartogramsare more effective for k > 0.3 where

k =1nx̄

n

∑i=1|xi− x̄| (9)

For the 2008 election dataset, k = 0.5989, so the Sun & Li result isconsistent with the results of this study.

Dunn (1988) [4] studied choropleth maps and framed rectanglecharts using a dataset of state-by-state murder rates in 1979, andconcluded that framed rectangle charts were more perceptually ef-fective; however, for that dataset (from Gale & Halperin 1982) [5],k = 0.4468, suggesting cartograms would have been a more effectivevisualization for that data.

Another commonly-seen bivariate map projection, usually visu-alized with a choropleth map, is county-by-county election results.Due to the incredible variation of county sizes (which range from

Figure 11: Individual accuracy distributions, Cartograms and Choro-pleth maps

fewer than 100 people to over over 10 million), k = 27.7874 forthis dataset during the 2016 election, making choropleth maps aperceptually inaccurate technique. Election results are also visu-alized across congressional districts; since congressional districtsare designed to have nearly equal populations, k < 0.01, making achoropleth map an effective choice for this dataset.

6.2 Other impactful data featuresThere are other factors of the data involved in determining percep-tual accuracy of the two visualization techniques. The graph ofsurvey results reveals that choropleth maps not only become lesseffective as variation decreases, they become even less effective thana random choice. One possible explanation for this is that the totalarea encompassed by each color, independent of alpha value, wasnot held constant (as it was in the cartograms).

The variation of the areas of each geographic bin is clearly anotherimportant factor in comparing the efficacy cartograms and choro-pleth maps. The geographical area of each bin can be consideredas another visualized variable such as area, alpha, or color. Sincecartograms negate the need for mentally normalizing a metric acrossareas, and the perceptual cost of area distortion does not increase asvariation increases, increased variation in the area of each bin wouldmost likely not affect the perceptual efficacy of the cartogram. Forchoropleths, this is not the case. At low levels of area variation, thechoropleth would be an effective choice because the viewer doesnot have to consider area when interpreting the map; if all areasare equal, areas with a lower alpha value (closer to white, in thecases above) always represent a lower value in the quantitative fieldthan areas with a higher alpha value. Consider fig. 12a, with areavariation = 0; in this case, it is clear to the viewer that A1 = A2 andB1 > B2 simply by comparing the colors. However, if areas arenot equal, the viewer must include the differences in their visualanalysis of the chart. In fig. 12b, however, it is not immediatelyclear that A1 = A2 or that B1 > B2, a confusion that increases asthe area variance increases.

Another possible variable that would influence the perceptual

Page 6: Impact of Data Variation on Efficacy of Cartograms and ... · data variation on graphical perception of cartograms and choropleth maps, and validates this theory based on a user

(a) Area variation = 0 (b) Area variation > 0

Figure 12: Choropleth maps at different area variations

accuracy of the two projections is the familiarity of the viewers withthe basemap. If the viewers are very familiar with the underlyinggeography, as in the examples with US states, they have a priorunderstanding of the areas of each geographic bin, which influencestheir perception of the area once it is resized. Theoretically, basemapfamiliarity would have relatively little effect on the perceptual accu-racy of choropleth maps, but a greater impact on cartograms. If thecartogram viewer has very little familiarity with the original shapesof the geographic bins, the area distortion may not have a significanteffect on perception; however, it is possible that the whitespacebetween areas causes the perceptual difficulties and thus viewerbasemap familiarity would not impact the perceptual accuracy.

7 FUTURE WORK

7.1 Other purposes for geographic visualization designThere are other features of visualization that this study does not ac-count for, such as other possible purposes of the visualization design.While this study optimizes for comparative accuracy between twooptions, there are a number of other purposes which may serve asother metrics for success in perceptual efficacy, which may affectthis study in multiple areas.

Instead of displaying the cumulative effect of trends across an en-tire geographic area, the maps could be created to highlight outliersin the dataset; in this case, standard deviation or variance may re-place mean absolute deviation in the calculation of dataset variation.Since the |x− x̄| term in the calculation of mean absolute deviationbecomes (x− x̄)2 in the calculation of variance, outliers with a highdistance from the mean will have a larger effect on the calculatedvariation.

If instead of showing the cumulative effect of a variable acrossan area, the maps were designed to allow the viewer to comparevalues across two areas within the broader context of the map, thequestion asked to the participants would change from a cumulativeclassification task to a focused classification between two areas. Ifthe goal were to highlight a geographic trend, generating the surveydatasets could involve a more strongly geographically-dependentcondition. This study could be effectively extended to cover any ofthese possible visualization purposes.

7.2 Other factors that influence perceptual accuracySection 6.2 outlines a few other features of the visualized datasetthat could have an effect on the relative perceptual accuracy ofchoropleths and cartograms. A future paper to expand this researchcould address the effect of any of these features.

Studying the effect of the variation in area of each geographicbin would be a challenge, as it is difficult to find geographic areaswith bin sizes at different levels of variance which still maintainthe viewer’s terrain familiarity. For example, the sample datasetcould be compared across counties, states, and countries; however, itwould be nearly impossible to account for the variation in basemapfamiliarity as most participants will be more familiar with the shapesof the US states than they would be with the shapes of each county.

There would have to be a number of additional controls as well, suchas the number of geographic bins in each projection.

Studying the effect of terrain familiarity would be worthwhile aswell, but this poses a similar challenge in finding geographic sets atvarying levels of familiarity. This is compounded with the problemthat not all viewers have the same familiarity when approaching thesame dataset, and this familiarity is not easily quantifiable, and notalways known by a visualization designer.

It is also possible to improve this research by including morecontrols. For example, a study could control the total area of eachchoropleth map occupied by each color across all variation levels,which could be achieved by satisfying this condition for each map.

∑i

airi−∑i

aibi = c1 (10)

7.3 Other possible projectionsThis study could also be easily expanded to include other types ofmap projections. The perceptual accuracy of Dorling cartogramsand contiguous cartograms would also be worthwhile to study in thisway, along with a number of other projections such as proportionalsymbol maps or dot maps. All of these projections can be usedto visualize bivariate datasets, but studying the accuracy of theseprojections for univariate datasets is also possible.

8 CONCLUSION

This study reveals the effect of variance on perceptual accuracy ofboth choropleth maps and non-contiguous cartograms for bivariatedatasets, and provides a simple metric for determining the more ef-fective projection given an input dataset. More generally, however, itshows the importance of considering the dataset when evaluating thebenefits of geographic visualizations. Comparative analysis of twoprojections using only a single input dataset is clearly insufficientregardless of the question for analysis, and an investigation of thetheoretical benefits of each projection in question, as well as how fea-tures of the data affect those benefits, is invaluable in understandingthe relative perceptual accuracy of the two projections.

REFERENCES

[1] W. S. Cleveland and R. McGill. Graphical perception: Theory, experi-mentation, and application to the development of graphical methods.Journal of the American Statistical Association, 79, 1984. doi: 10.2307/2981473

[2] D. Dorling. Area cartograms: their use and creation. Concepts andtechniques in modern geography, 59, 1996.

[3] R. Dunn. Variable-Width Framed Rectangle Charts for StatisticalMapping, journal = The American Statistician, year = 1987, volume =41, optnumber = 2, optpages = 153-156, doi = 10.2307/2684235, note= ,.

[4] R. Dunn. Framed Rectangle Charts or Statistical Maps with Shading:An Experiment in Graphical Perception. The American Statistician, 42,1988. doi: 10.2307/2684484

[5] N. Gale and W. Halperin. A case for better graphics: The unclassedchoropleth map. The American Statistician, 36, 1982.

[6] M. T. Gastner, M. E. J. Newman, and M. F. Goodchild. Diffusion-Based Method for Producing Density-Equalizing Maps. Proceedingsof the National Academy of Sciences of the United States of America,101, 2004.

[7] T. Griffin. Recognition of area units on topological cartograms. TheAmerican Cartographer, 10, 1983.

[8] A. MacEachren. The Role of Complexity and Symbolization Methodin Thematic Map Effectiveness. Annals of the Association of AmericanGeographers, 72, 1982.

[9] J. Mackinlay. Automating the design of graphical presentations ofrelational information. Acm Transactions On Graphics (Tog), 5, 1986.

[10] M. Newman. Maps of the 2016 us presidential election results, Novem-ber 2016.

Page 7: Impact of Data Variation on Efficacy of Cartograms and ... · data variation on graphical perception of cartograms and choropleth maps, and validates this theory based on a user

[11] K. A. Rittschof and R. W. Kulhavy. Learning and Remembering fromThematic Maps of Familiar Regions. Educational Technology Researchand Development, 46, 1998.

[12] R. Roth, A. W. Woodruff, and Z. F. Johnson. Value-by-alpha maps: Analternative technique to the cartogram. The Cartographic Journal, 47,2010.

[13] J. Steward and P. Kennelly. Illuminated Choropleth Maps. Annals ofthe Association of American Geographers, 100, 2010.

[14] H. Sun and Z. Li. Effectiveness of Cartogram for the Representa-tion of Spatial Data. Cartographic Journal, 47, 2010. doi: 10.1179/000870409X12525737905169

[15] W. Tobler. Thirty-Five Years of Computer Cartograms. Annals of theAssociation of American Geographers, 94, 2004.

[16] N. Yau. Alternative to cartograms using transparency, November 2008.