making sense of data visually: a modern look at datavisualization

Post on 13-Jul-2015

227 Views

Category:

Software

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Making sense of data visually:

A modern look at data visualization

VLADIMIR MILEV

NEW VENTURE SOFTWARE

Author BioVladimir Milev

MCPD Enterprise

Speaker (Devreach, NTK Slovenia and others)

DV Evangelist

Founder at New Venture Software

@vmilev

www.linkedin.com/in/vladimirmilev/

http://www.newventuresoftware.com/

Agenda1. Big data and information overload

2. What problems DataViz solves

3. DataViz fundamental theory

4. Basic visualizations

5. Advanced visualizations

Information OverloadTwitter: 500 million tweets per day

Facebook: 55 million status updates per day

Facebook: 900 million interactions per day (comments, likes etc.)

Reddit:

Proliferation of smart devices We are already living in a world dominated by

smart devices What is the meaning of this? More connected, data is more accessible Less space for tables and text Must use visual communication

Making Sense of DataIncreasing amount of data available

Increasing number of data consumer devices

Obtaining data no longer a problem

We have an Information Overload issue

Quick data analysis is the new problem

But how quick?

A Picture is worth a 1000 wordsWith about 1,000,000 ganglion cells, the human retina would transmit data at roughly the rate of an Ethernet connection, or 10 million bits per second.”

-Vijay Balasubramanian, PhD, Professor of Physics at U Penn

OK – That’s a lot of bandwidthBUT ARE WE USING IT EFFICIENTLY?

EfficiencyBest readers usually read up to about 300 words per minute.

Average word length is 5.1 letters

300 * 5.1 = 1530 characters per minute

Or 1530 / 60 = 25.5 characters per second

1 character is usually stored as 8 bits

26 * 8 = 208 bits per second

Reading bandwidth is ~0.025 KiB/s

Or 0.00208% Efficiency

So reading clearly isn’t the way to go…BUT WHAT IS THE SOLUTION?

Using statisticsFor the most part of the 20th century

Using arithmetic mean, average, standard deviation

Variance, correlations, regressions

Turns out this is not good enough

Anscombe’s QuartetI II III IV

x y x y x y x y

10 8.04 10 9.14 10 7.46 8 6.58

8 6.95 8 8.14 8 6.77 8 5.76

13 7.58 13 8.74 13 12.74 8 7.71

9 8.81 9 8.77 9 7.11 8 8.84

11 8.33 11 9.26 11 7.81 8 8.47

14 9.96 14 8.1 14 8.84 8 7.04

6 7.24 6 6.13 6 6.08 8 5.25

4 4.26 4 3.1 4 5.39 19 12.5

12 10.84 12 9.13 12 8.15 8 5.56

7 4.82 7 7.26 7 6.42 8 7.91

5 5.68 5 4.74 5 5.73 8 6.89

• Statistical properties are identical:• Mean of X (9.0) and Y (7.5) values are constant• Nearly same variances, correlations and regressions• As far as statistics is concerned these sets are almost the same

Anscombe’s Quartet

So DataViz is very powerful

But why does it work so well?

Gestalt PsychologySeeing with the brain

The mind understands external stimuli as whole rather than the sum of their parts

We tend to order our experience in a manner that is regular, orderly, symmetric, and simple

Key principles of gestalt: reification, multistability, invariance

Gestalt laws of grouping: proximity, similarity, closure, symmetry

Gestalt Principles - ReificationOur minds tend to construct/generate information

Gestalt Principles - Multistability

The tendency of our mind to jump back and forth between ambiguous alternative interpretations

Spinning Girl Rubin Vase

Gestalt Principles - InvarianceThe tendency to perceive simple geometric objects independent of rotation, translation, and scale

Also elastic deformations, different lighting, and different component features

Gestalt Laws of Grouping - Similarity

We group objects based on visual similarity

Gestalt Laws of Grouping - Proximity

We group items based on spatial proximity

Gestalt Laws of Grouping - Closure

We perceive objects such as shapes, letters, pictures, etc., as being whole when they are not complete

Application in Data Visualization Introducing the visual variables

Fundamental properties of objects which can encode information into a picture

Fundamental visual variables:◦ Position

◦ Size

◦ Color

◦ Shape

◦ Orientation

Basis for all Data Visualization!

Basic/Common VisualizationsBar graphs

Line graphs

Area charts

Pie charts

Bar Graphs

• Using color correctly to encode gender

• Using position (ordering) to create an orderly scale

• Using size to encode the values• Using orientation to differentiate

gender again

Bar Graphs continued

• Labels are used• Color is neutral and does not encode

information• Again, we have top-down ordering

(position)• And again size encodes the relative

numeric value

Bars and Normal Distribution

Minimum passing grade

• Distribution of test scores for Polish “Matura” exam

• Normal Distribution is expected

• Red line shows normal distribution

• 30 is the minimum expected grade

• Detecting behavioral changes• What happened?

Line Graphs

Confirming what we already know –paper media is declining rapidly.

• Shape encodes the value• Color is not significant• Design goal is to show a

trend/change

Area Graphs

Effect of school year on Team Fortress 2 players

School starts

• Similar to line graph• Design goal for area

charts is emphasize on the value/quantity, not so much on the trend

• You can see both• Color has no

meaning

Area Graphs continued• This time color carries a meaning (legend)

• The graph is also good for displaying ratio between series of data over time

Pie Charts

Pie ChartsGolden Rules for Pie Charts

• Ratio of one piece to the whole

• Order the values

• Less than 6 pieces

• Avoid legends

• Sum up to 100%

Abusing Pie Charts

Don’t break the rules!

Maps

Plot millions of journal entries from 18th and 19th century ship logs, and you reveal a picture of ocean trade you've never seen before

• Visualization of routes

• Color saturation indicates heavily used routes

Maps are good with animations too

• Concentration of NO2 from 2005 to 2011

• Using both color and position to encode concentration

• Using continuous color scale• Adding another dimension -

time

Choropleth Maps

Displaying the most popular name for a newborn in each state

• Using discrete palette to encode information

Heat Maps

• Excellent for plotting recurring values

• Color saturation/brightness encodes the values

• Position also encodes information

• Easy to spot concentrations and find patterns

Heat Maps medicine/genetics

Tree Maps

• Excellent for representing hierarchical data

• Color carries a meaning• Size carries a meaning as well• Position is irrelevant• Suitable for annotations

Parallel Coordinates Plot

• Interactive visualization• Good at displaying

relationships between different dimensions of data

• Position encodes dimension

• Color encodes scale

Parallel Coordinates Plot – in action

Selecting a subset of a dimension to display the relationships with the other dimensions

Chord Diagram

• Similar to Parallel Coordinates plot

• Color and Position used to encode data

• Design is different• Filtering of dimensions is not a

design goal• Focuses on selecting a whole

dimension

Some resourceshttp://www.reddit.com/r/dataisbeautiful/

http://blog.visual.ly/

http://flowingdata.com/

http://eagereyes.org/

http://www.perceptualedge.com/blog/

Thank You!

top related