enhancing set-analysis through scalable visualizations

34
May 09, 2006 CMSC 838S Information Visualization Spring 2006 1 Enhancing Set-Analysis through Scalable Visualizations Presented by: Hamid Haidarian Shahri ([email protected] ) Mudit Agrawal ([email protected] )

Upload: sera

Post on 19-Jan-2016

22 views

Category:

Documents


0 download

DESCRIPTION

Enhancing Set-Analysis through Scalable Visualizations. Presented by: Hamid Haidarian Shahri ( [email protected] ) Mudit Agrawal ( [email protected] ). Content. Problem Definition Motivation Dataset Architecture Visualization Methods Interaction Tools Demo Future Work. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Enhancing Set-Analysis  through Scalable Visualizations

May 09, 2006 CMSC 838S    Information Visualization Spring 2006 1

Enhancing Set-Analysis through Scalable Visualizations

Presented by:

Hamid Haidarian Shahri ([email protected])

Mudit Agrawal([email protected])

Page 2: Enhancing Set-Analysis  through Scalable Visualizations

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

2

Content

Problem Definition Motivation Dataset Architecture Visualization Methods Interaction Tools Demo Future Work

Page 3: Enhancing Set-Analysis  through Scalable Visualizations

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

3

Problem Definition

Analysis of sets by representing the clusters graphically depicting their internal and external links

Scaling visualization

Page 4: Enhancing Set-Analysis  through Scalable Visualizations

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

4

Motivation

Sets are encountered in various domains websites commodities publications anything that has attributes!!

Visualization of sets to aid human perception is still an unsolved problem no direct relations between sets (or its elements) in spatial

domain can be grouped based on various attributes

Page 5: Enhancing Set-Analysis  through Scalable Visualizations

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

5

Dataset 2700 law cases

Each case identified by a numerical id ranging from 1000 to 3718

Tuples in the dataset imply a referencing

Relation is unidirectional and not symmetric (the referencing also implies a temporal constraint on the cases)

Page 6: Enhancing Set-Analysis  through Scalable Visualizations

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

6

Snapshot of the dataFirst 50 links (approximately 0.1 percent of whole dataset)

(1001,1105,'100 S.Ct. 318'),(1001,1612,'101 S.Ct. 2352'),(1001,1018,'107 S.Ct. 1232'),(1001,1016,'112 S.Ct. 2886'),(1001,2923,'113 S.Ct. 2264'),(1001,1016,'120 L.Ed.2d 798'),(1001,2923,'124 L.Ed.2d 539'),(1001,2286,'138 F.3d 1036'),(1001,2396,'238 F.3d 382'),(1001,3410,'438 U.S. 104'),(1001,1105,'444 U.S. 51'),(1001,1612,'452 U.S. 264'),(1001,1018,'480 U.S. 470'),(1001,1016,'505 U.S. 1003'),(1001,2923,'508 U.S. 602'),(1001,3410,'57 L.Ed.2d 631'),(1001,1105,'62 L.Ed.2d 210'),(1001,1612,'69 L.Ed.2d 1'),(1001,1789,'926 F.2d 1169'),(1001,1018,'94 L.Ed.2d 472'),(1001,3410,'98 S.Ct. 2646'),(1002,1276,'100 S.Ct. 2138'),(1002,1101,'105 S.Ct. 3108'),(1002,1018,'107 S.Ct. 1232'),(1002,1098,'107 S.Ct. 2378'),(1002,1016,'112 S.Ct. 2886'),(1002,1015,'114 S.Ct. 2309'),(1002,1016,'120 L.Ed.2d 798'),(1002,1013,'121 S.Ct. 2448'),(1002,1012,'122 S.Ct. 1465'),(1002,1015,'129 L.Ed.2d 304'),(1002,2316,'142 F.3d 1319'),(1002,1013,'150 L.Ed.2d 592'),(1002,1012,'152 L.Ed.2d 517'),(1002,1121,'266 F.3d 487'),(1002,3028,'306 F.3d 113'),(1002,3410,'438 U.S. 104'),(1002,1276,'447 U.S. 255'),(1002,1101,'473 U.S. 172'),(1002,1018,'480 U.S. 470'),(1002,1098,'482 U.S. 304'),(1002,1016,'505 U.S. 1003'),(1002,1015,'512 U.S. 374'),(1002,1013,'533 U.S. 606'),(1002,1012,'535 U.S. 302'),(1002,3410,'57 L.Ed.2d 631'),(1002,2091,'59 F.3d 852'),(1002,1276,'65 L.Ed.2d 106'),(1002,1889,'746 F.2d 135'),(1002,1101,'87 L.Ed.2d 126'),(1002,1018,'94 L.Ed.2d 472'),(1002,2319,'953 F.2d 1299'),(1002,1098,'96 L.Ed.2d 250'),(1002,3410,'98 S.Ct. 2646'),(1002,1022,'980 F.2d 84'),(1002,2670,'989 F.2d 362'),(1003,1104,'100 S.Ct. 383'),(1003,1611,'104 S.Ct. 2862'),(1003,1100,'106 S.Ct. 1018'),(1003,1099,'107 S.Ct. 2076'),(1003,1016,'112 S.Ct. 2886'),(1003,3110,'116 S.Ct. 2432'),(1003,1016,'120 L.Ed.2d 798'),(1003,1012,'122 S.Ct. 1465'),(1003,1881,'13 F.3d 1192'),(1003,3054,'133 F.3d 893'),(1003,3110,'135 L.Ed.2d 964'),(1003,1012,'152 L.Ed.2d 517'),(1003,1047,'18 F.3d 1560'),(1003,1886,'265 F.3d 1237'),(1003,2689,'271 F.3d 1090'),(1003,1358,'271 F.3d 1327'),(1003,1149,'28 F.3d 1171'),(1003,1040,'331 F.3d 891')

(1001,1105,'100 S.Ct. 318')

Page 7: Enhancing Set-Analysis  through Scalable Visualizations

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

7

Architecture

DataClustering

Module

Similarity Metric

Clustered Data

Visualization Module

Page 8: Enhancing Set-Analysis  through Scalable Visualizations

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

8

Routine K-Means Clustering Data points are in

vector space. x and are vectors. This assumption does

not hold for cases represented as sets.

Centroids are not simple geometric means.

In fact, mean does not make any sense.

2

1i

k

j ji j

VSx

j

Page 9: Enhancing Set-Analysis  through Scalable Visualizations

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

9

Routine Self Organizing Map

Wv and D are assumed to be vectors.

Wv(t + 1) = Wv(t) + Θ(t)α(t) [D(t) - Wv(t)]

This assumption does not hold.

Page 10: Enhancing Set-Analysis  through Scalable Visualizations

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

10

Similarity Measures

Jaccard similarity

Reference-based similarity

Weighted reference-based similarity

( , )A B

J A BA B

( , )S A B A B

( )

( , ) x A B

f x

WS A BA B

Page 11: Enhancing Set-Analysis  through Scalable Visualizations

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

11

Contribution to clustering

Applying K-means and SOM for producing better visualizations

Not apparent at first glance, but the above algorithms are not applicable to set visualization directly

They assume a 2D or nD (vector) representation for each data point (i.e. law case). More specifically, the attributes must form a vector space.

This assumption does not hold no clear geometric attribute corresponding to the dataset

Page 12: Enhancing Set-Analysis  through Scalable Visualizations

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

12

Similarity Metrics Geometric Metrics

1-D Partitioning

2-D Partitioning Sequential arrangement Distance based arrangement

1 2 5 9

3 4 7 12

6 8 11 14

10 13 15 16

Page 13: Enhancing Set-Analysis  through Scalable Visualizations

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

13

K-Means

Page 14: Enhancing Set-Analysis  through Scalable Visualizations

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

14

K-M

ean

s

Page 15: Enhancing Set-Analysis  through Scalable Visualizations

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

15

SO

M a

fter

K-M

ean

s

Page 16: Enhancing Set-Analysis  through Scalable Visualizations

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

16

Various Interactive Tools

Referencing pattern (activating all links)

Local referencing Density map Representative element Tool tip Link follow-up Search

Page 17: Enhancing Set-Analysis  through Scalable Visualizations

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

17

Referencing Pattern

Page 18: Enhancing Set-Analysis  through Scalable Visualizations

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

18

Local Referencing

Page 19: Enhancing Set-Analysis  through Scalable Visualizations

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

19

Local Referencing

Page 20: Enhancing Set-Analysis  through Scalable Visualizations

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

20

Density Map

Page 21: Enhancing Set-Analysis  through Scalable Visualizations

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

21

Density Map

Page 22: Enhancing Set-Analysis  through Scalable Visualizations

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

22

Representative Element

Page 23: Enhancing Set-Analysis  through Scalable Visualizations

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

23

Link Follow-up

Page 24: Enhancing Set-Analysis  through Scalable Visualizations

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

24

Link Follow-up

Page 25: Enhancing Set-Analysis  through Scalable Visualizations

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

25

Link Follow-up

Page 26: Enhancing Set-Analysis  through Scalable Visualizations

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

26

Link Follow-up

Page 27: Enhancing Set-Analysis  through Scalable Visualizations

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

27

Link Follow-up

Page 28: Enhancing Set-Analysis  through Scalable Visualizations

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

28

Link Follow-up

Page 29: Enhancing Set-Analysis  through Scalable Visualizations

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

29

Link Follow-up

Page 30: Enhancing Set-Analysis  through Scalable Visualizations

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

30

Link Follow-up

Page 31: Enhancing Set-Analysis  through Scalable Visualizations

May 09, 2006 CMSC 838S    Information Visualization Spring 2006 31

DEMO

Page 32: Enhancing Set-Analysis  through Scalable Visualizations

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

32

Future Work

Other clustering algorithms can be explored: Spectral Fuzzy C-means

More similarity functions

Better initial posting of data

Zooming and Panning

Page 33: Enhancing Set-Analysis  through Scalable Visualizations

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

33

References Abello, J., Korn, J., Visualizing Massive Multi-Digraphs. Proceedings of the IEEE Symposium on

Information Visualization 2000. Berry, M.W., Drma, Z., Jessup, E.R., Matrices, Vector Spaces, and Information Retrieval. SIAM Review,

41:2, 1999, pp. 335-362. Gansner , E.R., Koutsofios, E., North, S.C., Vo, K.P., A Technique for Drawing Directed Graphs. IEEE

Trans. on Soft. Eng. 19(3), 1993, pp. 214-230. Guimerà, R., Mossa, S., Turtschi, A., Amaral, L.A.N., The Worldwide Air Transportation Network:

Anomalous Centrality, Community Structure, and Cities' Global Roles. Proceedings of the National Academy of Sciences 102, May 31, 2005, pp. 7794-7799.

Jain, A.K., Murty, M.N., Flynn, P.J., Data Clustering: A Review. ACM Computing Surveys, 1999. Kohonen, T., The Self-Organizing Map. Proceedings of the IEEE, Volume 78, Issue 9, Sept. 1990, pp.

1464-1480. Kohonen, T., Kaski, S., Lagus, K., Salojärvi, J., Honkela, J., Paatero, V., Saarela, A., Self organization

of a massive document collection. IEEE Transactions on Neural Networks, Vol. 11, 2000, pp. 574-585. Kunz, C., Botsch, V., Ziegler, J., Spath, D., Contextualizing Search Results in Networked Directories.

Proceedings of HCII, 2003. Leuski, A., Strategy-based Interactive Cluster Visualization for Information Retrieval. International

Journal on Digital Libraries, Vol. 3, Issue 2, 2000, pp. 170. Liu, X., Luo, M., Shneiderman B. Visualization of Sets. Unpublished manuscript, 2005. McQueen, J.B., Some Methods for classification and Analysis of Multivariate Observations.

Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, 1967, pp. 281-297.

Murata, T., Visualizing the Structure of Web Communities Based on Data Acquired From a Search Engine. IEEE Trans. on Industrial Electronics, Vol. 50, No. 5, 2003.

Palla, G., Derenyi, I., Farkas, I., Vicsek, T., Uncovering the Overlapping Structure of Complex Networks in Nature and Society. Nature Letters, Vol. 435, 9 June 2005, pp. 814.

Self-organizing map. Wikipedia, The Free Encyclopedia. Seo, J., Shneiderman, B., Understanding Hierarchical Clustering Results by Interactive Exploration of

Dendograms: A Case Study with Genomic Microarray Data. IEEE Computer Special Issue on Bioinformatics, Volume 35, No. 7, July 2002, pp. 80-86.

Page 34: Enhancing Set-Analysis  through Scalable Visualizations

May 09, 2006 CMSC 838S    Information Visualization Spring 2006 34

Thanks!