dicon: visual analysis on multidimensional clusters · visual encoding encoding data items in...

Post on 08-Aug-2020

8 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

DICON: Visual Analysis On Multidimensional Clusters

Nan Cao, David Gotz, Jimeng Sun, Huamin Qu

Topic: Cluster Analysis Link: http://en.wikipedia.org/wiki/Cluster_analysis

Applications: •Biology •Medicine •Market research •Education Research •Other applications

Cluster Analysis

Cluster Analysis

dataset Cluster Analysis: K = 3 K = 5

Cluster Analysis

Ground Truth: The data contains 6 clusters

• Problems of cluster analysis

– The cluster result is not always precisely reveals the ground truth of the data

– The cluster analysis highly depend on the experience of the analyzer. It is most unlike to find the ground truth within a single iteration

– In case of multidimensional dataset, it is difficult for explain the meaning of the clusters

Cluster Analysis

Ground Truth: The data contains 6 clusters

• Problems of cluster analysis

– The cluster result is not always precisely reveals the ground truth of the data

– The cluster analysis highly depend on the experience of the analyzer. It is most unlike to find the ground truth within a single iteration

– In case of multidimensional dataset, it is difficult for explain the meaning of the clusters

How can information visualization aid on

cluster analysis?

Challenges

• How can we interpret the multidimensional cluster results?

• How can we make comparisons among multidimensional clusters?

• How can we refine the clustering results and detect multidimensional patterns?

Solution • Goal:

– Design an novel visualization for multidimensional cluster analysis that facilitates cluster interpretation, quality evaluation, comparison and manipulation

• Approach: – A multidimensional cluster icon design that encodes

multiple data attributes as well as derived statistical information for cluster interpretation

– A stabilized icon layout algorithm that generates similar icons for similar clusters for cluster comparison

– New visual cues that evaluate cluster qualities and highlights the information patterns as well as Intuitive user interactions driven by these cues to support cluster refinement via direct manipulation of icons

How can we interpret the multidimensional cluster result in details?

Encoding the single entity

Packing entities into clusters

Global layout

? Using an iconic design to

visualize multidimensional clusters at multiple granularity

Visual Encoding

Encoding data items in detail

Packing Entities into clusters

0.3 0.2 0.1 0.1 0.2 0.1

entity

cancer diabetes

kidney disorder heart disease Fever high blood pressure

cancer diabetes

kidney disorder

heart disease Fever high blood pressure

Global Layout

E.g. the patient dataset

Intuitively share the same visual encodings at the feature level, the entity level and the cluster level

Design Guideline 1

feature entity

cluster

DEMO

How can we make comparisons among multidimensional clusters ?

Encoding data items in detail

Packing Entities into clusters

Global Layout

0.3 0.2 0.1 0.1 0.2 0.1

entity cancer

diabetes

kidney disorder

heart disease

hiv

high blood pressure

cancer diabetes

kidney disorder

heart disease HIV high blood pressure

Similar clusters should be represented by similar icons – Overview: Similar clusters have

similar data distributions

– Details: Similar clusters must be laid out in a similar way

Design Guideline 2

?

How can we make comparisons among multidimensional clusters ?

Encoding data items in detail

Packing Entities into clusters

Global Layout

Statistical Embedding (overview)

Stabilized icon Layout

(detail)

Similar clusters should be represented by similar icons – Overview: Similar clusters have

similar data distributions

– Details: Similar clusters must be laid out in a similar way

Design Guideline 2

?

Statistical Embedding(1)

• Kurtosis

• Skewness

Statistical Embedding

Stabilized icon Layout

Stabilized Layout

Statistical Embedding

Stabilized icon Layout

1. Initial Spiral layout 2. Weighted Centroid Voronoi Tessellation

3. Random Layout for features

4. Optimization

ji

ii

ji

ji

iji

ii XpreXXXd

cX 2

3

2

22

2

1 ||||1

||min

Centroid Similarity Smoothness

Fit in multiple scales and can be embedded into various other visualizations – Both color and shape is highly

scalable can be distinguishable even in a very small area

Design Guideline 3 Global Layout

Encoding data items in detail

Packing items into clusters

Global Layout

How can we refine the clustering results and How can we detect interesting patterns within the multidimensional clusters? ?

Interactive visual analysis driven by visual cues

Cluster Quality Cue

Cluster Quality: Defined by the signed variances of its containing entities

f

ffsign

1

1)(

)( fsign

f the feature vector of a single entity

the mean feature vector of the cluster C that contains f

the variance between f and

Signed variance:

High quality clusters has a homogenous representation

Low quality clusters has a heterogeneous representation

Feature Co-occurrence and Dominant Cue

f1 f2 f3 f4 f5 f6 f2

if fi > 0, we call it occurred

If fi > 0, fj > 0, and fi, fj in the same vector, we call they are co-occurred

f5 Feature Vector

j

iji ffpC2

0|0

Co-occurrence Score:

Co-occurrence Cue: Highlight the features that are mostly co-occurred with others

Dominant Cue: Highlight the features that are not co-occurred with any other feature

Interactions and Animated Transition

• Interactions – attribute group

– Split : binary split / outlier split

– Merge: drag merge and select merge

• Animation Path Bundling – Aggregate the animation

paths with similar trends

– Inspired by the hierarchal edge bundling

• Demo

Evaluation

Comparing with other techniques

• The cluster is easy to identify

• Immediately convene the size of each cluster

• Fast comparison

• Highly compressed, can be imbedded into other visualizations

• Base on intuitive designs

Advantages:

• Multidimensional Only

• No precise value is directly observed

• Splitting entities into multiple parts

Disadvantages:

Case Study (1) Study on Patient Similarity

1. Find a group of patient that are similar to a target patient. The similarity is automatically computed based on five features 2. Initial cluster result is given 3. Users are required to refine the clusters and interpret why the patient in the cluster are similar

Case Study (2)

Highlight all the co-occurred features we find different disease distribution patterns

User Study

• T1: Compare on feature details of 9 clusters

• T2: Compare on large set of clusters, 50 clusters

• 3 (groups) X 10 (user) X 2 (tasks)

Icons laid out randomly

Icons laid out by our algorithm

With statistical embedding

User Study Results

• Finding:

– The cluster icon design is extremely efficient on cluster comparison (Average 12s for compare 50 clusters)

– The proposed design principles help great on comparison

DICON: Visual Analysis On Multidimensional Clusters

Nan Cao, David Gotz, Jimeng Sun, Huamin Qu

Related Work

• Pixel Based Technique

• Iconic Techniques

• Parallel Coordinates

• Scatter Plots

Prior Art: Icon-based techniques

• Chernoff face visualization • Stick figure technique

– two dimensions are mapped to the display dimensions and the remaining dimensions are mapped to the angles and/or limb lengths of the stick figure icon

– the number of dimensions that can be visualized is limited

• Shape encoding • Color Icons

Prior Art:Pixel-Oriented Techniques

• Query Independent – Space-Filling Curve

Arrangements

– Recursive Pattern Technique

• Query Dependent

– Spiral Technique

– Axes Technique

– Circle Segments

Prior Art: Table-based techniques

• Table Lens

• Tableau

• Heat Map

Prior Art: Others (Hybrid Techniques) • NodeTrix: a Hybrid Visualization of Social Networks.

Nathalie Henry, Jean-Daniel Fekete, Michael J. McGuffin, InfoVis 2007

• Scattering Points in Parallel Coordinates. Xiaoru Yuan, Peihong Guo, He Xiao, Hong Zhou, Huamin Qu, InfoVis 2009

• Bubble Sets: Revealing Set Relations with Isocontours over Existing Visualizations, Christopher Collins, Gerald Penn, Sheelagh Carpendale, InfoVis 2009

• Rolling the Dice: Multidimensional Visual Exploration using Scatterplot Matrix Navigation. Niklas Elmqvist, Pierre Dragicevic, Jean-Daniel Fekete, InfoVis 2008

• Interactive Dimensionality Reduction Through User-defined Combinations of Quality Metrics, Sara Johansson, Jimmy Johansson, InfoVis 2009

• FacetAtlas: Multifaceted Visualization for Rich Text Corpora, InfoVis 2010

top related