data visualization summary ihub

Post on 16-Aug-2015

168 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Data Visualization Nikhil Srivastava, 2015

Nikhil Srivastava

iHub Summer Data Jam

Data Visualization Nikhil Srivastava, 2015

About this Lecture

• Shortened version of longer course

• Course website

– Slides, demos, extra material

– Code samples and libraries

– Final projects

Data Visualization Nikhil Srivastava, 2015

Effective Data Visualization

Data Visualization Nikhil Srivastava, 2015

Nikhil Srivastava

nsrivast@gmail.com

0713 987 262

I build products & businesses in the fields of finance & technology.

I organize & visualize information for teaching & understanding.

nikhilsrivastava.com

Data Visualization Nikhil Srivastava, 2015

• What is Data Visualization?

• Thinking and Seeing

• From Data to Graphics

• Principles and Guidelines

• Building Visualizations

• Advanced Topics

introduction

foundation & theory

building blocks

design & critique

construction

advanced

Outline

Data Visualization Nikhil Srivastava, 2015

• What is Data Visualization?

• Thinking and Seeing

• From Data to Graphics

• Principles and Guidelines

• Building Visualizations

• Advanced Topics

introduction

foundation & theory

building blocks

design & critique

construction

advanced

Data Visualization Nikhil Srivastava, 2015

Data Visualization

Information Visualization

Scientific Visualization Infographics

Statistical GraphicsInformative Art

ArtScience

Statistics

JournalismDesign

Visual Analytics

Data Visualization Nikhil Srivastava, 2015

City/Town County Population Ahero Kisumu 76,828 Athi River Machakos 139,380 Awasi Kisumu 93,369 Kangundo-Tala Machakos 218,557 Karuri Kiambu 129,934 Kiambu Kiambu 88,869 Kikuyu Kiambu 233,231 Kisumu Kisumu 409,928 Kitale Trans-Nzoia 106,187 Kitui Kitui 155,896 Limuru Kiambu 104,282 Machakos Machakos 150,041 Molo Nakuru 107,806 Mwingi Kitui 83,803 Naivasha Nakuru 181,966 Nakuru Nakuru 307,990 Nandi Hills Trans-Nzoia 73,626 Ruiru Kiambu 238,858 Thika Kiambu 139,853

Data Visualization Nikhil Srivastava, 2015

City/Town County Population Ahero Kisumu 76,828 Athi River Machakos 139,380 Awasi Kisumu 93,369 Kangundo-Tala Machakos 218,557 Karuri Kiambu 129,934 Kiambu Kiambu 88,869 Kikuyu Kiambu 233,231 Kisumu Kisumu 409,928 Kitale Trans-Nzoia 106,187 Kitui Kitui 155,896 Limuru Kiambu 104,282 Machakos Machakos 150,041 Molo Nakuru 107,806 Mwingi Kitui 83,803 Naivasha Nakuru 181,966 Nakuru Nakuru 307,990 Nandi Hills Trans-Nzoia 73,626 Ruiru Kiambu 238,858 Thika Kiambu 139,853

• Which is the most populous

city in the list?

• Which county in the list has

the most cities?

• Which county in the list has

the largest average city?

Data Visualization Nikhil Srivastava, 2015

Data Visualization Nikhil Srivastava, 2015

• Which is the most populous

city in the list?

• Which county in the list has

the most cities?

• Which county in the list has

the largest average city?

Data Visualization Nikhil Srivastava, 2015

• Which is the most populous

city in the list?

• Which county in the list has

the most cities?

• Which county in the list has

the largest average city?

• What is the population of

Limuru?

Data Visualization Nikhil Srivastava, 2015

• Which is the most populous

city in the list?

• Which county in the list has

the most cities?

• Which county in the list has

the largest average city?

Data Visualization is:

• Useful

– Answers user questions

– Reduces user workload

(by design, not by default)

Data Visualization Nikhil Srivastava, 2015

Anscombe’s quartet (1973)

Data Visualization Nikhil Srivastava, 2015

Anscombe’s quartet (1973)

Data Visualization Nikhil Srivastava, 2015

Data Visualization is:

• Important

– Understand structure and patterns

– Resolve ambiguity

– Locate outliers

Data Visualization Nikhil Srivastava, 2015

Data Visualization Nikhil Srivastava, 2015

Data Visualization is:

• Important

– Design decisions affect interpretation

Data Visualization Nikhil Srivastava, 2015

Data Visualization Nikhil Srivastava, 2015

Data Visualization Nikhil Srivastava, 2015

Data Visualization Nikhil Srivastava, 2015

Data Visualization is:

• Powerful

– Communicate, teach, inspire

Data Visualization Nikhil Srivastava, 2015

Data Visualization is:

• Relevant

– In one second …

– Open data, open technologies

– Growing use in business,

education, media, advertising …

Data Visualization Nikhil Srivastava, 2015

Definitions

• “the process that transforms (abstract) data into

interactive graphical representations” 1

• “finding the artificial memory that best supports

our natural means of perception” 2

• “visual representations of data to amplify

cognition” 3

• “giving information a visual representation” 4

Data Visualization Nikhil Srivastava, 2015

Focus Extra

purpose communicate explore, analyze

data numerical,categorical

text, maps, graphs, networks

feature representation animation,Interactivity

Course Scope

Data Visualization Nikhil Srivastava, 2015

• What is Data Visualization?

• Thinking and Seeing

• From Data to Graphics

• Principles and Guidelines

• Building Visualizations

• Advanced Topics

introduction

foundation & theory

building blocks

design & critique

construction

advanced

Data Visualization Nikhil Srivastava, 2015

Bandwidth of Our Senses

Why Vision?

Data Visualization Nikhil Srivastava, 2015

The Hardware

Data Visualization Nikhil Srivastava, 2015

The Software• High-level concepts: objects,

symbols

• Involves working memory

• Slower, serial, conscious

• Sensory input

• Low-level features: orientation,

shape, color, movement

• Rapid, parallel, automatic

Visual Perception

“Bottom-up”

Data Visualization Nikhil Srivastava, 2015

The Software• High-level concepts: objects,

symbols

• Involves working memory

• Slow, sequential, conscious

• Sensory input

• Low-level features: orientation,

shape, color, movement

• Rapid, parallel, automatic

Visual Perception

“Bottom-up”

“Top-down”

Data Visualization Nikhil Srivastava, 2015

Task: Counting

How many 3’s?

1281768756138976546984506985604982826762 9809858458224509856458945098450980943585 9091030209905959595772564675050678904567 8845789809821677654876364908560912949686

Data Visualization Nikhil Srivastava, 2015

Task: Counting

How many 3’s?

1281768756138976546984506985604982826762 9809858458224509856458945098450980943585 9091030209905959595772564675050678904567 8845789809821677654876364908560912949686

1281768756138976546984506985604982826762 9809858458224509856458945098450980943585 9091030209905959595772564675050678904567 8845789809821677654876364908560912949686

Data Visualization Nikhil Srivastava, 2015

Task: Counting

Slow, sequential, conscious

Rapid, parallel, automatic

1281768756138976546984506985604982826762 9809858458224509856458945098450980943585 9091030209905959595772564675050678904567 8845789809821677654876364908560912949686

1281768756138976546984506985604982826762 9809858458224509856458945098450980943585 9091030209905959595772564675050678904567 8845789809821677654876364908560912949686

Data Visualization Nikhil Srivastava, 2015

Task: (Distractor) Search

Which side has the red circle?

Data Visualization Nikhil Srivastava, 2015

Task: (Distractor) Search

Which side has the red circle?

Data Visualization Nikhil Srivastava, 2015

Task: Search

Which side has the red circle?

Data Visualization Nikhil Srivastava, 2015

Task: Search

Which side has the red circle?

Data Visualization Nikhil Srivastava, 2015

Task: Search

Slow, sequential, conscious

Rapid, parallel, automatic

Data Visualization Nikhil Srivastava, 2015

Task: Unique SearchSlow, sequential, conscious

Rapid, parallel, automatic

(7)

(5)

(3)

Data Visualization Nikhil Srivastava, 2015

Lessons for Visualization

• Use “pre-attentive” attributes when possible

– Color, shape, orientation (depth, motion)

– Faster, higher bandwidth

• Caveats

– Working memory: magical number 7 (+/- 2)

– Be careful mixing attributes

Data Visualization Nikhil Srivastava, 2015

Example: Too Many Attributes

Data Visualization Nikhil Srivastava, 2015

Example: Too Many Attributes

Data Visualization Nikhil Srivastava, 2015

Eye != Camera

Data Visualization Nikhil Srivastava, 2015

Eye != Camera

limited aperture

limited color

Data Visualization Nikhil Srivastava, 2015

Data Visualization Nikhil Srivastava, 2015

Eye != Camera

Saccades: limited time and location

Data Visualization Nikhil Srivastava, 2015

Eye != Camera: Relative

A

B

Data Visualization Nikhil Srivastava, 2015

Eye != Camera: Relative

Data Visualization Nikhil Srivastava, 2015

Eye != Camera: Knowledge

Data Visualization Nikhil Srivastava, 2015

Eye != Camera: Knowledge

Data Visualization Nikhil Srivastava, 2015

Lessons for Visualization

• Human vision has limits and constraints:

aperture, color, time, location

• “What we see” depends on “what we

know”

• Attention and experience matters

Data Visualization Nikhil Srivastava, 2015

• What is Data Visualization?

• Thinking and Seeing

• From Data to Graphics

• Principles and Guidelines

• Building Visualizations

• Advanced Topics

introduction

foundation & theory

building blocks

design & critique

construction

advanced

Data Visualization Nikhil Srivastava, 2015

From Data to Graphics

What kind

of data do

we have?

How can we

represent the

data visually?

How can we

organize this into

a visualization?

Athi River Machakos 139,380

Awasi Kisumu 93,369

Kangundo-Tala Machakos 218,557

Karuri Kiambu 129,934

Kiambu Kiambu 88,869

Kikuyu Kiambu 233,231

Kisumu Kisumu 409,928

Kitale Trans-Nzoia 106,187

Kitui Kitui 155,896

Limuru Kiambu 104,282

Machakos Machakos 150,041

Molo Nakuru 107,806

Mwingi Kitui 83,803

Naivasha Nakuru 181,966

Nakuru Nakuru 307,990

Nandi Hills Trans-Nzoia 73,626

Visual Encoding

Data Visualization Nikhil Srivastava, 2015

What kind

of data do

we have?

How can we

represent the

data visually?

How can we

organize this into

a visualization?

Athi River Machakos 139,380

Awasi Kisumu 93,369

Kangundo-Tala Machakos 218,557

Karuri Kiambu 129,934

Kiambu Kiambu 88,869

Kikuyu Kiambu 233,231

Kisumu Kisumu 409,928

Kitale Trans-Nzoia 106,187

Kitui Kitui 155,896

Limuru Kiambu 104,282

Machakos Machakos 150,041

Molo Nakuru 107,806

Mwingi Kitui 83,803

Naivasha Nakuru 181,966

Nakuru Nakuru 307,990

Nandi Hills Trans-Nzoia 73,626

Data Visualization Nikhil Srivastava, 2015

Data as Input

Athi River Machakos 139,380

Awasi Kisumu 93,369

Kangundo-Tala Machakos 218,557

Karuri Kiambu 129,934

Kiambu Kiambu 88,869

Kikuyu Kiambu 233,231

Kisumu Kisumu 409,928

Kitale Trans-Nzoia 106,187

Kitui Kitui 155,896

Limuru Kiambu 104,282

Machakos Machakos 150,041

Molo Nakuru 107,806

Mwingi Kitui 83,803

Naivasha Nakuru 181,966

Nakuru Nakuru 307,990

Nandi Hills Trans-Nzoia 73,626

CleanRestructure

ExploreAnalyze

DATA

Visualization Goals

Data Visualization Nikhil Srivastava, 2015

Model and Attribute

item attribute A attribute B … attribute M

item 1 value1_A value1_B …

item 2 value2_A value2_B …

… … …

item N valueN_M

Data Visualization Nikhil Srivastava, 2015

Data TypesCATEGORICAL ORDINAL NUMERICAL

Interval Ratio

Male / Female

Asia / Africa / Europe

True / False

Small / Med / Large

Low / High

Yes / Maybe / No

Latitude/Longitude

Compass direction

Time (event)

Length

Count

Time (duration)

= = = =

< > < > < >

+ - + -

* /

Data Visualization Nikhil Srivastava, 2015

Data Types: Example

• Which are categorical? (=)

• Which are ordinal? (= < >)

ID Gender Test Score Grade Size Temperature

1 Male 77 C Small 36.5

2 Female 85 B Large 37.2

3 Female 95 A Medium 36.7

4 Male 90 A Large 37.4

• Which are interval? (= < > + -)

• Which are ratio? (= < > + - * /)

Data Visualization Nikhil Srivastava, 2015

Data Type TransformationCATEGORICAL ORDINAL NUMERICAL

Interval Ratio

Male / Female

Asia / Africa / Europe

True / False

Small / Med / Large

Low / High

Yes / Maybe / No

Time

Latitude/Longitude

Compass direction

Time

Length

Count

Binning/Categorizing

Differencing/Normalization

Data Visualization Nikhil Srivastava, 2015

Advanced Data Types

• Networks/Graphs

– Hierarchies/Trees

• Text

• Maps: points, regions, routes

Data Visualization Nikhil Srivastava, 2015

What kind

of data do

we have?

How can we

represent the

data visually?

How can we

organize this into

a visualization?

Athi River Machakos 139,380

Awasi Kisumu 93,369

Kangundo-Tala Machakos 218,557

Karuri Kiambu 129,934

Kiambu Kiambu 88,869

Kikuyu Kiambu 233,231

Kisumu Kisumu 409,928

Kitale Trans-Nzoia 106,187

Kitui Kitui 155,896

Limuru Kiambu 104,282

Machakos Machakos 150,041

Molo Nakuru 107,806

Mwingi Kitui 83,803

Naivasha Nakuru 181,966

Nakuru Nakuru 307,990

Nandi Hills Trans-Nzoia 73,626

Data Visualization Nikhil Srivastava, 2015

Visual Encodings

Marks

point

line

area

volume

Channels

position

size

shape

color

angle/tilt

Data Visualization Nikhil Srivastava, 2015

Channel Effectiveness

Data Visualization Nikhil Srivastava, 2015

Channel Effectiveness

“Spatial position is such a good visual

coding of data that the first decision of

visualization design is which variables get

spatial encoding at the expense of others”

Data Visualization Nikhil Srivastava, 2015

Color as a Channel

Categorical Quantitative

Hue Good (6-8 max)

Poor

Value Poor Good

Saturation Poor Okay

Data Visualization Nikhil Srivastava, 2015

What kind

of data do

we have?

How can we

represent the

data visually?

How can we

organize this into

a visualization?

Athi River Machakos 139,380

Awasi Kisumu 93,369

Kangundo-Tala Machakos 218,557

Karuri Kiambu 129,934

Kiambu Kiambu 88,869

Kikuyu Kiambu 233,231

Kisumu Kisumu 409,928

Kitale Trans-Nzoia 106,187

Kitui Kitui 155,896

Limuru Kiambu 104,282

Machakos Machakos 150,041

Molo Nakuru 107,806

Mwingi Kitui 83,803

Naivasha Nakuru 181,966

Nakuru Nakuru 307,990

Nandi Hills Trans-Nzoia 73,626

Data Visualization Nikhil Srivastava, 2015

type mark channel data represented

Scatter Plot point position 2 quantitative

Data Visualization Nikhil Srivastava, 2015

type mark channel data represented

Scatter + Hue point position,color

2 quantitative, 1 categorical

Data Visualization Nikhil Srivastava, 2015

type mark channel data represented

Scatter + Size (“Bubble”)

point position,size

3 quantitative

Data Visualization Nikhil Srivastava, 2015

Scatter Plot – Applications

CORRELATION GROUPING OUTLIERS

Data Visualization Nikhil Srivastava, 2015

Scatter Plot – Dangers

OCCLUSION (DENSITY)

OCCLUSION (OVERLAP)

3-D

Data Visualization Nikhil Srivastava, 2015

type mark channel data represented

Line Chart line position(orientation)

2 quantitative

Data Visualization Nikhil Srivastava, 2015

type mark channel data represented

Area Chart area size (length) 2 quantitative

Data Visualization Nikhil Srivastava, 2015

Line Chart – Applications

PATTERN OVER TIME COMPARISON

Data Visualization Nikhil Srivastava, 2015

Line Chart – Dangers

Y SCALING

X SCALING

OVERLOAD

Data Visualization Nikhil Srivastava, 2015

type mark channel data represented

Bar Chart line size (length) 1 categorical,1 quantitative

Data Visualization Nikhil Srivastava, 2015

type mark channel data represented

Histogram line size (length) 1 ordinal/quantitative,1 quantitative (count)

Data Visualization Nikhil Srivastava, 2015

Bar Chart – Applications

COMPARE CATEGORIES DISTRIBUTION

Data Visualization Nikhil Srivastava, 2015

Bar Chart – Dangers

TOO MANY CATEGORIES

POORLY SORTED

Data Visualization Nikhil Srivastava, 2015

type mark channel data represented

Pie Chart area size (angle) 1 quantitative

Data Visualization Nikhil Srivastava, 2015

Pie Chart – Dangers

AREA SCALE SIMILAR AREAS OVERLOAD

Data Visualization Nikhil Srivastava, 2015

Multi-Series Bar Charts

GROUPED BAR CHART

STACKED BAR CHART

Data Visualization Nikhil Srivastava, 2015

Multi-Series Line Charts

MULTIPLE LINE

STACKED AREA CHART

Data Visualization Nikhil Srivastava, 2015

Normalization

NORMALIZED BAR NORMALIZED AREA

Data Visualization Nikhil Srivastava, 2015

Small Multiples Chart

Data Visualization Nikhil Srivastava, 2015

Advanced Charts

Treemap (Hierarchical Data)

Channels: ?

Strengths:

nested relationships

Concerns:

order vs aspect ratio

Data Visualization Nikhil Srivastava, 2015

Advanced Charts

Multi-Level Pie(Hierarchical Data)

Channels: ?

Strengths:

nested relationships

Concerns:

readability

Data Visualization Nikhil Srivastava, 2015

Advanced Charts

Heat Map(Table/Field Data)

Channels: ?

Strengths: pattern/outlier detection

Concerns: ordering/ clustering

Data Visualization Nikhil Srivastava, 2015

Advanced Charts

Choropleth Map(Region Data)

Channels: ?

Strengths:

geography

Concerns:

region size

color spectrum

Data Visualization Nikhil Srivastava, 2015

Advanced Charts

Cartogram(Region Data)

Channels: ?

Strengths: geographic pattern

Concerns: base map knowledge

Data Visualization Nikhil Srivastava, 2015

• What is Data Visualization?

• Thinking and Seeing

• From Data to Graphics

• Principles and Guidelines

• Building Visualizations

• Advanced Topics

introduction

foundation & theory

building blocks

design & critique

construction

advanced

Data Visualization Nikhil Srivastava, 2015

From Science to Art

• Design principles*

• Style guidelines*

*dependent on visualization context

and objective (and author)

Data Visualization Nikhil Srivastava, 2015

Design Principles

Data Visualization Nikhil Srivastava, 2015

Design Principles

• Integrity

– Tell the truth with data

• Effectiveness

– Achieve visualization objectives

• Aesthetics

– Be compelling, vivid, beautiful

Data Visualization Nikhil Srivastava, 2015

Integrity

Lie Ratio = size of effect in graphic

size of effect in data

Data Visualization Nikhil Srivastava, 2015

Integrity

Data Visualization Nikhil Srivastava, 2015

Integrity

“show data variation, not design variation”

Data Visualization Nikhil Srivastava, 2015

Effectiveness*

Data/Ink Ratio = ink representing data

total ink

*according to Tufte

Data Visualization Nikhil Srivastava, 2015

Effectiveness* *according to Tufte

avoid “chart junk”

Data Visualization Nikhil Srivastava, 2015

Avoid Chart Junk

Data Visualization Nikhil Srivastava, 2015

Avoid Chart Junk

Data Visualization Nikhil Srivastava, 2015

Avoid Chart Junk

Data Visualization Nikhil Srivastava, 2015

Avoid Chart Junk

Data Visualization Nikhil Srivastava, 2015

Avoid Chart Junk

Data Visualization Nikhil Srivastava, 2015

Avoid Chart Junk

Data Visualization Nikhil Srivastava, 2015

Effectiveness (Few)

Data Visualization Nikhil Srivastava, 2015

Practical Guidelines

• Avoid 3-D charts

• Focus on substance over graphics

• Avoid separate legends and keys

• Faint grids/guidelines

• Avoid unnecessary textures and colors

Data Visualization Nikhil Srivastava, 2015

Color Guidelines

• To label

• To emphasize

• To liven or decorate

Data Visualization Nikhil Srivastava, 2015

Bad Color

Data Visualization Nikhil Srivastava, 2015

Good Color

Data Visualization Nikhil Srivastava, 2015

More Color Guidelines

• Use color only when necessary

• Use saturated colors for data labels, thin

lines, small areas

• Use less saturated colors for large areas,

backgrounds

• Use tools like ColorBrewer

Data Visualization Nikhil Srivastava, 2015

• What is Data Visualization?

• Thinking and Seeing

• From Data to Graphics

• Principles and Guidelines

• Building Visualizations

• Advanced Topics

introduction

foundation & theory

building blocks

design & critique

construction

advanced

Data Visualization Nikhil Srivastava, 2015

What Software to Use?

Athi River Machakos 139,380

Awasi Kisumu 93,369

Kangundo-Tala Machakos 218,557

Karuri Kiambu 129,934

Kiambu Kiambu 88,869

Kikuyu Kiambu 233,231

Kisumu Kisumu 409,928

Kitale Trans-Nzoia 106,187

Kitui Kitui 155,896

Limuru Kiambu 104,282

Machakos Machakos 150,041

Molo Nakuru 107,806

Mwingi Kitui 83,803

Naivasha Nakuru 181,966

Nakuru Nakuru 307,990

Nandi Hills Trans-Nzoia 73,626

CleanRestructure

ExploreAnalyze

DATA

Visualization Goals

Data Visualization Nikhil Srivastava, 2015

Visualization Software

• Web friendly

– Highcharts

– InfoVis

– Processing

– D3

• Statistics

– Python (Matplotlib)

– R (ggplot2)

• Maps

– Google Charts

– Leaflet

– CartoDB

• Dashboards

• Graphs

– GraphViz

– Gephi

Data Visualization Nikhil Srivastava, 2015

Highcharts - Reference

• Examples

– Hello, Chart

– Basic Charts

• Documentation, API

• Highcharts Cloud

Data Visualization Nikhil Srivastava, 2015

• What is Data Visualization?

• Thinking and Seeing

• From Data to Graphics

• Principles and Guidelines

• Building Visualizations

• Advanced Topics

introduction

foundation & theory

building blocks

design & critique

construction

advanced

Data Visualization Nikhil Srivastava, 2015

The Ebb and Flow of Movies

NY Times, 2008

Advanced Visualizations

Data Visualization Nikhil Srivastava, 2015

Word Cloud - “Data Visualization” Wikipedia PageWordle

Data Visualization Nikhil Srivastava, 2015

ZIPScribbleRobert Kosara, 2006

Data Visualization Nikhil Srivastava, 2015

Twitter NetworksPJ Lamberson, 2012

Data Visualization Nikhil Srivastava, 2015

Blogs

• Infosthetics.com

• Visualizing.org

• FlowingData.com

Data Visualization Nikhil Srivastava, 2015

Nikhil Srivastava

nsrivast@gmail.com

top related