logo beautiful data lecturer: dr. bo yuan e-mail: [email protected]

63
LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: [email protected]

Upload: hubert-doyle

Post on 26-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

LOGO

Beautiful Data

Lecturer: Dr. Bo Yuan

E-mail: [email protected]

Page 2: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

LOGO

Exploring Millions of

Social Stereotypes

Page 3: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

3

• How old do they look?

• Do you think they look smart?

• How do we perceive age, gender, and attractiveness?

Data Analysis!

Page 4: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

The FaceStat Judging Interface

4

Page 5: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Preprocessing the Data

5

Problematic data

Aggregate results from multiple people into a single description

Map from multiple-choice responses to one numerical value

Page 6: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Exploring the Data

6

Initial scatterplot matrix of the face data

Page 7: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Exploring the Data

7

Initial histogram of face age data

Page 8: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Exploring the Data

8

Histogram of cleaned face age data

Page 9: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Age, Attractiveness, and Gender

9

Scatterplot of attractiveness versus age, colored by gender

Page 10: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Age, Attractiveness, and Gender

10

Smoothed scatterplots for attractiveness versus age, colored by gender

Page 11: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Age, Attractiveness, and Gender

11

Three iterations of plotting attractiveness versus age versus gender:(a) ages averaged within buckets per age year, (b) 95% confidence interval for each bucket, plus loess curves, and (c) larger buckets where the data is sparser.

Page 12: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Age, Attractiveness, and Gender

12

Pearson correlation matrix

Page 13: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Clustering

13

Attractiveness versus age, colored by cluster, 2000 points.

Page 14: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Clustering

14

Cluster centroids, tags, and exemplars

Page 15: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Clustering

15

Cluster centroids, tags, and exemplars

Page 16: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Conclusion

Our data indicates some familiar stereotypes.

Women are considered more attractive than men

Age have a stronger attractiveness effect for women than men

Also some potential surprises.

Babies are most attractive

Conservatives look more intelligent

The point of this instance is not to come to any particular conclusion.

Instead, we want to show some examples of the rich set of significant patterns contained in large, messy data set of human judgments.

16

Page 17: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

LOGO

Visualizing Urban Data

Page 18: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Crimespotting Project

18

Page 19: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Home Page

19

Page 20: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

How to Get the Crime Data?

Collect further details on the crime reports

Determining the location of crime

Recognize the crime icon

Get an image from CrimeWatch server

20

Page 21: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

A Sample Image

21

A sample image from CrimeWatch shows areas of the theft, narcotics, robbery, and other crimes.

Page 22: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

A Sample Image

22

The same sample image from CrimeWatch with programmatically recognized icons outlined.

Page 23: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

A Sample Image

23

The same sample image with the reddish parts made white to show the red boxing glove icon more clearly.

Page 24: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Geolocation

24

A map of downtown Oakland showing three reference points for triangulation purposes.

Page 25: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

The Spotlight Feature

25

The type selector shows the total numbers of each report type in the selected time span

Page 26: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Conclusion

Crime is a serious issue for any urban resident, by visualizing the crime data can we effectively protect the citizens.

The project has been a productive success, resulting in what we believe is a data service maximally useful to local residents.

City and government information is being moved onto the Internet to match the expectations of a connected, wired citizenry.

For more information about Crimespotting:

http://oakland.crimespotting.org/

26

Page 27: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

LOGO

Beautiful Political Data

Page 28: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Data Help Obama Win

28

Page 29: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Redistricting and Partisan Bias

Redistricting Redistricting is the process of drawing United States electoral

district boundaries, often in response to population changes determined by the results of the decennial census.

Partisan Bias Partisan bias is a measure of how much the electoral system

favors the Democrats or Republicans, after accounting for their vote share.  

29

Page 30: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Redistricting and Partisan Bias

30

Effect of redistricting on partisan bias

Page 31: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Time Series of Estimates

31

Page 32: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Age and Voting

32

Sure, young people voted heavily for Mr.Obama, but they voted heavily for John Kerry. ----Mark Penn, Political Consultant

Was he right?

Page 33: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Age and Voting

33

Some graphs showing recent patterns of voting by ages

Page 34: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Localized Partisanship in Pennsylvania

34

Geographic partisanship in Pennsylvania

Page 35: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Conclusion

Political data is increasingly accessible and is increasingly being plotted and shared in the media and on the web.

At the research level, articles in political science journals are starting to make use of graphical techniques for discovery and presentation of results.

Statistical visualization to become more important and more widespread in political analysis.

35

Page 36: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

LOGO

Data Finds Data

Page 37: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

37

An example Corruption at the Roulette Wheel Past Posting

Data Finds Data

Page 38: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Data Finds Data

38

What can data finds data system do for us?

Guest Convenience

Customer service

On the way to “data finds data”:

Page 39: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Data Finds Data

39

What can data finds data system do for us? Improved Child Safety Cross-compartment Exploitation

Page 40: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Data Finds Data

40

What should we solve first? All examples benefit from just in-time discovery. However, we should solve the “enterprise discoverability” problem. Federated search

Do not have the indexes necessary to enable the efficient location of a record.

Requires recursive processing.

Federated search cannot support the “data finds data” mission, because it has no ability to deliver on enterprise discoverability at scale.

Directories are necessary!

Page 41: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Conclusion

Determine how new observations relate to what is known.

Differentiate one organization from another.

Likely become another building block from which next generations of advanced analytics will benefit.

41

Page 42: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

LOGO

Exploring Your Life in Data

Page 43: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Exploring Your Life in Data

43

Web: About sharing, broadcasting and distributing. About tracking, monitoring, analyzing his\her habits and behaviors.

Tools: PEIR & YFD

Difference: PEIR runs in the background and automatically upload data. YFD requires that users actively enter data.

Page 44: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

44

Some Examples

DietSense

Family Dynamics

Walkability

Thanks to built-in sensors.

All bring people involved in their communities with just their mobile phones.

Page 45: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

45

Visualization

• Traces are colored based on impact and exposure values.

• A different mapping scheme that make all trips on the map mono-color, using circles to encode impact and exposure.

• All traces are colored white, and the model values are visually represented with circles that varies in size at the end of each trip.

• Greater values are displayed as circles larger in area while lesser values are smaller in area.

Page 46: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Visualization

46

• We grayscaled map tiles and inverted the color filters so that map items that were originally lightly colored turned dark and vice versa.

• To be more specific, the terrain was originally lightly colored, so now it is dark gray, and roads that were originally dark are now light gray.

• This darkened map lets lightly colored traces stand out, and because the map is grayscale, there is less clashing.

Page 47: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

47

Visualization

• PEIR provides histograms to show distributions of impact and exposure for selected trips.

Page 48: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

48

PEIR Interface

Page 49: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

49

Design of Interface in YFD

Page 50: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

50

Track of Feelings and Emotions

Page 51: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Conclusion

People who collect data about themselves are not necessarily after the actual data.

They are mostly interested in the resulting information and how they can use their own data to improve themselves.

We use the data visualization to teach and to draw interest.

51

Page 52: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

LOGO

The Design of Sense.us

Page 53: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

The Design of Sense.us

Data beautiful?

How to make it beautiful?

An example to demonstrate: sense.us

53

Page 54: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Quartet ——An Example

Four data sets

54

Same statistical properties

Page 55: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Quartet ——An Example

55

Page 56: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Back to Sense.us

Consider The correlation between two numerical values

To visualize change over time

56

scatterplot

line graph

Not always the case

Page 57: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Back to Sense.us

Effect of our choice was influenced by Martin’s Baby Name

Voyager visualization, a stacked graph of baby name popularity

that became surprisingly popular online.

57

Page 58: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Stacked Graph

Job Voyager

58

Job Voyager visualization:

Left: an overview showing the constitution of the labor force over 150 years;

Right: a filtered view showing the percentage of farmers.

Page 59: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Differentiate Individual Series

When we filtered the view to show only males or only females.

Enable perceptual discrimination by varying color saturation in an arbitrary fashion.

Rather than vary colors arbitrarily, do so in a meaningful, data-driven way.

Subsequently vary color saturation according to socio-economic index scores for each occupation.

59

Page 60: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Stacked Graph

Birthplace Voyager

60

Birthplace Voyager visualization:

Left: an overview showing the distribution of birthplaces over 150 years;

Right: a filtered view showing the total number of European immigrants.

Page 61: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

U.S. Census State Map and Scatterplot

61

Left: Interactive state map showing changes in each state’s population from 2000 to

2005;

Right: Scatterplot of U.S. states showing median household income (x-axis) versus

retail sales (y-axis); New Hampshire and

Delaware have the highest retail sales.

Page 62: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Population Pyramid

62

Population pyramid visualization:

Left: a comparison of the total number of males and females in each age group in 2000;

Right: the distribution of school attendees in 2000 (an annotation highlights the

prevalence of adult education).

Page 63: LOGO Beautiful Data Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Conclusion

The combination of interactive visualization and social interpretation can help an audience more richly explore a data set.

The forms of analysis we observed in sense.us were exploratory in nature, the system had a clear educational benefit and users reported that using sense.us was both enjoyable and informative.

63