enhancing set-analysis through scalable visualizations
DESCRIPTION
Enhancing Set-Analysis through Scalable Visualizations. Presented by: Hamid Haidarian Shahri ( [email protected] ) Mudit Agrawal ( [email protected] ). Content. Problem Definition Motivation Dataset Architecture Visualization Methods Interaction Tools Demo Future Work. - PowerPoint PPT PresentationTRANSCRIPT
May 09, 2006 CMSC 838S Information Visualization Spring 2006 1
Enhancing Set-Analysis through Scalable Visualizations
Presented by:
Hamid Haidarian Shahri ([email protected])
Mudit Agrawal([email protected])
May 09, 2006 CMSC 838S Information Visualization Spring 2006
2
Content
Problem Definition Motivation Dataset Architecture Visualization Methods Interaction Tools Demo Future Work
May 09, 2006 CMSC 838S Information Visualization Spring 2006
3
Problem Definition
Analysis of sets by representing the clusters graphically depicting their internal and external links
Scaling visualization
May 09, 2006 CMSC 838S Information Visualization Spring 2006
4
Motivation
Sets are encountered in various domains websites commodities publications anything that has attributes!!
Visualization of sets to aid human perception is still an unsolved problem no direct relations between sets (or its elements) in spatial
domain can be grouped based on various attributes
May 09, 2006 CMSC 838S Information Visualization Spring 2006
5
Dataset 2700 law cases
Each case identified by a numerical id ranging from 1000 to 3718
Tuples in the dataset imply a referencing
Relation is unidirectional and not symmetric (the referencing also implies a temporal constraint on the cases)
May 09, 2006 CMSC 838S Information Visualization Spring 2006
6
Snapshot of the dataFirst 50 links (approximately 0.1 percent of whole dataset)
(1001,1105,'100 S.Ct. 318'),(1001,1612,'101 S.Ct. 2352'),(1001,1018,'107 S.Ct. 1232'),(1001,1016,'112 S.Ct. 2886'),(1001,2923,'113 S.Ct. 2264'),(1001,1016,'120 L.Ed.2d 798'),(1001,2923,'124 L.Ed.2d 539'),(1001,2286,'138 F.3d 1036'),(1001,2396,'238 F.3d 382'),(1001,3410,'438 U.S. 104'),(1001,1105,'444 U.S. 51'),(1001,1612,'452 U.S. 264'),(1001,1018,'480 U.S. 470'),(1001,1016,'505 U.S. 1003'),(1001,2923,'508 U.S. 602'),(1001,3410,'57 L.Ed.2d 631'),(1001,1105,'62 L.Ed.2d 210'),(1001,1612,'69 L.Ed.2d 1'),(1001,1789,'926 F.2d 1169'),(1001,1018,'94 L.Ed.2d 472'),(1001,3410,'98 S.Ct. 2646'),(1002,1276,'100 S.Ct. 2138'),(1002,1101,'105 S.Ct. 3108'),(1002,1018,'107 S.Ct. 1232'),(1002,1098,'107 S.Ct. 2378'),(1002,1016,'112 S.Ct. 2886'),(1002,1015,'114 S.Ct. 2309'),(1002,1016,'120 L.Ed.2d 798'),(1002,1013,'121 S.Ct. 2448'),(1002,1012,'122 S.Ct. 1465'),(1002,1015,'129 L.Ed.2d 304'),(1002,2316,'142 F.3d 1319'),(1002,1013,'150 L.Ed.2d 592'),(1002,1012,'152 L.Ed.2d 517'),(1002,1121,'266 F.3d 487'),(1002,3028,'306 F.3d 113'),(1002,3410,'438 U.S. 104'),(1002,1276,'447 U.S. 255'),(1002,1101,'473 U.S. 172'),(1002,1018,'480 U.S. 470'),(1002,1098,'482 U.S. 304'),(1002,1016,'505 U.S. 1003'),(1002,1015,'512 U.S. 374'),(1002,1013,'533 U.S. 606'),(1002,1012,'535 U.S. 302'),(1002,3410,'57 L.Ed.2d 631'),(1002,2091,'59 F.3d 852'),(1002,1276,'65 L.Ed.2d 106'),(1002,1889,'746 F.2d 135'),(1002,1101,'87 L.Ed.2d 126'),(1002,1018,'94 L.Ed.2d 472'),(1002,2319,'953 F.2d 1299'),(1002,1098,'96 L.Ed.2d 250'),(1002,3410,'98 S.Ct. 2646'),(1002,1022,'980 F.2d 84'),(1002,2670,'989 F.2d 362'),(1003,1104,'100 S.Ct. 383'),(1003,1611,'104 S.Ct. 2862'),(1003,1100,'106 S.Ct. 1018'),(1003,1099,'107 S.Ct. 2076'),(1003,1016,'112 S.Ct. 2886'),(1003,3110,'116 S.Ct. 2432'),(1003,1016,'120 L.Ed.2d 798'),(1003,1012,'122 S.Ct. 1465'),(1003,1881,'13 F.3d 1192'),(1003,3054,'133 F.3d 893'),(1003,3110,'135 L.Ed.2d 964'),(1003,1012,'152 L.Ed.2d 517'),(1003,1047,'18 F.3d 1560'),(1003,1886,'265 F.3d 1237'),(1003,2689,'271 F.3d 1090'),(1003,1358,'271 F.3d 1327'),(1003,1149,'28 F.3d 1171'),(1003,1040,'331 F.3d 891')
(1001,1105,'100 S.Ct. 318')
May 09, 2006 CMSC 838S Information Visualization Spring 2006
7
Architecture
DataClustering
Module
Similarity Metric
Clustered Data
Visualization Module
May 09, 2006 CMSC 838S Information Visualization Spring 2006
8
Routine K-Means Clustering Data points are in
vector space. x and are vectors. This assumption does
not hold for cases represented as sets.
Centroids are not simple geometric means.
In fact, mean does not make any sense.
2
1i
k
j ji j
VSx
j
May 09, 2006 CMSC 838S Information Visualization Spring 2006
9
Routine Self Organizing Map
Wv and D are assumed to be vectors.
Wv(t + 1) = Wv(t) + Θ(t)α(t) [D(t) - Wv(t)]
This assumption does not hold.
May 09, 2006 CMSC 838S Information Visualization Spring 2006
10
Similarity Measures
Jaccard similarity
Reference-based similarity
Weighted reference-based similarity
( , )A B
J A BA B
( , )S A B A B
( )
( , ) x A B
f x
WS A BA B
May 09, 2006 CMSC 838S Information Visualization Spring 2006
11
Contribution to clustering
Applying K-means and SOM for producing better visualizations
Not apparent at first glance, but the above algorithms are not applicable to set visualization directly
They assume a 2D or nD (vector) representation for each data point (i.e. law case). More specifically, the attributes must form a vector space.
This assumption does not hold no clear geometric attribute corresponding to the dataset
May 09, 2006 CMSC 838S Information Visualization Spring 2006
12
Similarity Metrics Geometric Metrics
1-D Partitioning
2-D Partitioning Sequential arrangement Distance based arrangement
1 2 5 9
3 4 7 12
6 8 11 14
10 13 15 16
May 09, 2006 CMSC 838S Information Visualization Spring 2006
13
K-Means
May 09, 2006 CMSC 838S Information Visualization Spring 2006
14
K-M
ean
s
May 09, 2006 CMSC 838S Information Visualization Spring 2006
15
SO
M a
fter
K-M
ean
s
May 09, 2006 CMSC 838S Information Visualization Spring 2006
16
Various Interactive Tools
Referencing pattern (activating all links)
Local referencing Density map Representative element Tool tip Link follow-up Search
May 09, 2006 CMSC 838S Information Visualization Spring 2006
17
Referencing Pattern
May 09, 2006 CMSC 838S Information Visualization Spring 2006
18
Local Referencing
May 09, 2006 CMSC 838S Information Visualization Spring 2006
19
Local Referencing
May 09, 2006 CMSC 838S Information Visualization Spring 2006
20
Density Map
May 09, 2006 CMSC 838S Information Visualization Spring 2006
21
Density Map
May 09, 2006 CMSC 838S Information Visualization Spring 2006
22
Representative Element
May 09, 2006 CMSC 838S Information Visualization Spring 2006
23
Link Follow-up
May 09, 2006 CMSC 838S Information Visualization Spring 2006
24
Link Follow-up
May 09, 2006 CMSC 838S Information Visualization Spring 2006
25
Link Follow-up
May 09, 2006 CMSC 838S Information Visualization Spring 2006
26
Link Follow-up
May 09, 2006 CMSC 838S Information Visualization Spring 2006
27
Link Follow-up
May 09, 2006 CMSC 838S Information Visualization Spring 2006
28
Link Follow-up
May 09, 2006 CMSC 838S Information Visualization Spring 2006
29
Link Follow-up
May 09, 2006 CMSC 838S Information Visualization Spring 2006
30
Link Follow-up
May 09, 2006 CMSC 838S Information Visualization Spring 2006 31
DEMO
May 09, 2006 CMSC 838S Information Visualization Spring 2006
32
Future Work
Other clustering algorithms can be explored: Spectral Fuzzy C-means
More similarity functions
Better initial posting of data
Zooming and Panning
May 09, 2006 CMSC 838S Information Visualization Spring 2006
33
References Abello, J., Korn, J., Visualizing Massive Multi-Digraphs. Proceedings of the IEEE Symposium on
Information Visualization 2000. Berry, M.W., Drma, Z., Jessup, E.R., Matrices, Vector Spaces, and Information Retrieval. SIAM Review,
41:2, 1999, pp. 335-362. Gansner , E.R., Koutsofios, E., North, S.C., Vo, K.P., A Technique for Drawing Directed Graphs. IEEE
Trans. on Soft. Eng. 19(3), 1993, pp. 214-230. Guimerà, R., Mossa, S., Turtschi, A., Amaral, L.A.N., The Worldwide Air Transportation Network:
Anomalous Centrality, Community Structure, and Cities' Global Roles. Proceedings of the National Academy of Sciences 102, May 31, 2005, pp. 7794-7799.
Jain, A.K., Murty, M.N., Flynn, P.J., Data Clustering: A Review. ACM Computing Surveys, 1999. Kohonen, T., The Self-Organizing Map. Proceedings of the IEEE, Volume 78, Issue 9, Sept. 1990, pp.
1464-1480. Kohonen, T., Kaski, S., Lagus, K., Salojärvi, J., Honkela, J., Paatero, V., Saarela, A., Self organization
of a massive document collection. IEEE Transactions on Neural Networks, Vol. 11, 2000, pp. 574-585. Kunz, C., Botsch, V., Ziegler, J., Spath, D., Contextualizing Search Results in Networked Directories.
Proceedings of HCII, 2003. Leuski, A., Strategy-based Interactive Cluster Visualization for Information Retrieval. International
Journal on Digital Libraries, Vol. 3, Issue 2, 2000, pp. 170. Liu, X., Luo, M., Shneiderman B. Visualization of Sets. Unpublished manuscript, 2005. McQueen, J.B., Some Methods for classification and Analysis of Multivariate Observations.
Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, 1967, pp. 281-297.
Murata, T., Visualizing the Structure of Web Communities Based on Data Acquired From a Search Engine. IEEE Trans. on Industrial Electronics, Vol. 50, No. 5, 2003.
Palla, G., Derenyi, I., Farkas, I., Vicsek, T., Uncovering the Overlapping Structure of Complex Networks in Nature and Society. Nature Letters, Vol. 435, 9 June 2005, pp. 814.
Self-organizing map. Wikipedia, The Free Encyclopedia. Seo, J., Shneiderman, B., Understanding Hierarchical Clustering Results by Interactive Exploration of
Dendograms: A Case Study with Genomic Microarray Data. IEEE Computer Special Issue on Bioinformatics, Volume 35, No. 7, July 2002, pp. 80-86.
May 09, 2006 CMSC 838S Information Visualization Spring 2006 34
Thanks!