nishant mehta, elke a. rundensteiner and matt ward computer science department
DESCRIPTION
Hierarchy Navigation Framework: Supporting Scalable Interactive Exploration over Large Databases. Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department Worcester Polytechnic Institute IDEAS’05 Thank you to NSF for several IDM grants for XMDV project. - PowerPoint PPT PresentationTRANSCRIPT
1
Hierarchy Navigation Framework: Supporting Scalable Interactive
Exploration over Large Databases
Nishant Mehta, Elke A. Rundensteiner and Matt WardComputer Science DepartmentWorcester Polytechnic Institute
IDEAS’05
Thank you to NSF for several IDM grants for XMDV project.
3
XmdvTool: Multivariate Data Visualization
Example
MPG Cyli. HP Wt.18 8 130 3504
17 8 132 3700
.
.
.
.
.
.
.
.
.
.
.
.
40 2 100 2500
Cars Data Set
Parallel coordinate display
Dataset with 4096 points in XmdvTool 6.0
18
8
130 3504
4
Hierarchical Displays [Fua:99]
C
G
Base Data Points
MPG Cyli. HP Wt.
C 6 8 130 3504
D 4 8 132 3000
G 10 6 110 2800
H 12 2 70 2100
I 15 2 80 2200
J 15 2 100 2500
B 5 8 131 3252
F 12.3 3.33 86.66 2366.6
Cars Data Set
Structure-based brush components: b- level of detaild- focus areae- focus extents
C D
B
A
E
F
G I
J
H
6
42
30
00
0 0
A 10.33 4.66 103.66 2684
E 13 33 90 2400
D
JH
I
0
5
Hierarchical Displays
6
Problems: Hierarchical Display
Achieved: Screen space solution to clutter
problem
But Data handing problem remains …
Cluster tree size greater than initial tree Cluster tree may not fit into main memory Structure based brush semantics involve
recursive searches over cluster tree
7
Goal Overall Goal:
Scale hierarchical displays to support navigation over large hierarchies
Subgoals : Support navigation over large-scale
persistent data Store hierarchies on disk Map navigation operations to efficient queries
Meet interactive response requirements
8
Support navigation operations over large scale persistent data
Overview of Approach
Meet interactive response requirements
HierarchyEncoding
Caching
Prefetching
Spatial Indexing
9
Hierarchy EncodingProblem : Structure-based brush Selection semantics involve recursive search Recursive search over secondary storage is slow
Solution: Hierarchy encoding Push recursive processing into precomputation step Precompute label for each node in hierarchy Map recursive search to equivalent non-recursive one
LabelingHierarchical
DataDatabase
Hierarchy Encoding
10
Structure-Based Brush Semantics [Fua:99]
Horizontal Selection Subtree (e1, e2)
Vertical Selection Level of detail (lod)
C D
B
A
E
F
G I
J
H
0.6
0.4
00
000
0.2
0 0
Node selection based on 2 steps:
11
AimSelect subtree that user is interested in viewing
Approach Brush focus extents (e1,e2), select set of base points. Propagate selection: select parent(n) if n is selected
Horizontal Selection
Selected Clusters
Selected Leaves
C D
B
A
E
F
G I
J
H
0.6
0.4
0
00
0
0 0
0.2
0.3
(e1,e2) = (2/6, 11/12) , lod=0.4
12
Non-Recursive Horizontal Selection
Offline Precompute intervals for each node (hmin, hmax) Interval of parent includes interval of childOnline Search for nodes that intersect brush interval (e1,e2)
C D
B
A
E
F
G I
J
H
(0,1/6) (1/6,2/6)
(2/6,3/6)(3/6,4/6)
(4/6,5/6)
(5/6,1)
(0,2/6)
(2/6,5/6)
(2/6,1)
(0,1)
0.6
0.5
0
0
0.3
0.2
00
00
(e1,e2) = (2/6, 11/12) , lod=0.4
13
Vertical Selection Aim
Select points at desired lod (lod handle of SBB) Approach
Explore each branch starting at root to find node: lod(n) <= lod(brush)
C D
B
A
E
GI
J
H
0.6
0.2 0.5
0
0
0
0
0.30 0 F
SBB: (e1,e2) = (2/6, 11/12) , lod=0.4
lod=0.4
14
Non-Recursive Vertical Selection Node n satisfies vertical selection criteria iff:
C D
B
A
E
F
G I
J
H
0.2 0.5
00
0
0.30 0
lod(brush) = 0.4
0.2
,0.
6
0.5
,0.
6
0,0
.5
0
0,0
.3
0,0
.3
0,0
.3
0,0
.2
0,0
.2
0.3
,0.5
0.6
lod(n) <= lod(brush) < lod(parent(n))
Each node n, has extents (vmin,vmax)
vmin<= lod(brush) < vmax
0.6
,
SBB: (e1,e2) = (2/6, 11/12) , b=0.4
15
Non-Recursive Selection
C D
B
A
E
F
G I
J
H
0.6
,
0.2
,0.
6
0.5
,0.
6
0,0
.5
0,0
.3
0,0
.3
0,0
.3
0,0
.2
0,0
.2
0.3
,0.5
(0,1)
(2/6,1)
(4/6,5/6)(3/6,4/6)
(2/6,3/6)
(2/6,5/6)`(0,1/6)
(0,2/6)`
(1/6,2/6) (5/6,1)
Selects all nodes that satisfy: hmin <= e2 and hmax >= e1 vmin <= lod(brush) < vmax
SBB: (e1,e2) = (2/6, 11/12) , lod=0.4
16
2D Hierarchy Map
0.6
0.2
0.5
03/6 4/6 5/6 1
0.3
C D G H I
B
J
F
E
1.0
1/6
Brush
A
C D
B
A
E
F
G I
J
H
0.6
,
0.2
,0.
6
0.5
,0.
6
0,0
.5
0,0
.3
0,0
.3
0,0
.3
0,0
.2
0,0
.2
0.3
,0.5
(0,1)
(2/6,1)
(4/6,5/6)(3/6,4/6)
(2/6,3/6)
(2/6,5/6)(0,1/6)
(0,2/6)
(1/6,2/6)(5/6,1)
SBB: (e1,e2) = (2/6, 11/12) , lod=0.4
e2e1
lod
17
Properties of 2D Hierarchy Map
Progressive Tree Structure Space Filling Non-Overlapping
BF
E
A
C D G H
B
J
F
E
I
1.0
0.6
0.5
0.30.2
011/6 2/6 3/6 4/6 5/6
18
Navigation operations in 2D Hierarchy Map
0.6
0.2
0.5
02/6 3/6 4/6 5/6 1
0.3
C D G H I
B
J
F
E
1.0
1/6
Brush
A
selected
20
Spatial Index Q searches for nodes intersecting structure based
brush Q is spatial range query over spatial objects
2D Hierarchy Map
01/6 2/6 3/6 4/6 5/6 1
Brush0.6
0.2
0.5
BF
E
A
0.3
C D G H
B
J
F
E
I
1.0
Spatial Index (R-Tree index) can help faster searches
26
Next
Caching and Prefetching
27
Presence of idle timePredictable of user movements (User Inertia)
Locality of explorationContiguous queries have similar answers
User Trace Characteristics [Doshi:2003]
Caching
Prefetching
0.6
0.2
0.5
02/6 3/6 4/6 5/6
1
0.3
C D G H I
B
J
F
E
1/6
BrushA
28
Cache Design Purpose
Minimize system latency
Design Issues Cache Organization Cache Lookup Policy Cache Replacement Policy Computation of Remainder Queries
29
Cache Organization Contiguous chunk of main
memory that stores recently fetched nodes
Each node has a descriptor Horizontal and Vertical Extents
GF
H
EA
2D Hierarchy Map in database 2D Hierarchy Map of Cache Contents
C D G H I
JFB
E
(0,0) (1,0)
(0,1)
G H
F
E
(0,0)
(1,0)
(0,1)
A A
emptyoccupied
Cache OrganizationCache Lookup PolicyCache Replacement PolicyComputation of Remainder Queries
30
Cache Lookup
Cache Lookup Sequential scan, or Main memory spatial index
(1,0)
Brush
G H
F
E
(0,0)
(0,1) A Main Memory Index
Advantage Faster cache look up
Disadvantage Frequent index updates
empty
selectedoccupied
Aim: Find nodes in cache that lie in current brush
Cache OrganizationCache Lookup PolicyCache Replacement PolicyComputation of Remainder Queries
31
Cache Replacement Policy
Locality of Exploration
Spatial Locality
Distance
Temporal Locality
Contiguous queries have similar answers
LRU
Aim: Make room for new nodes Replace node with least probability of being
referenced. Approach
Exploit general user trace characteristics
Cache OrganizationCache Lookup PolicyCache Replacement PolicyComputation of Remainder Queries
32
Distance Replacement Policy
Realization : Maintain brush store Select victim brush with max distance from current brush Replace individual cached nodes in victim brush
Distance: Length of line segment that joins center of 2 brushes.
Idea Replace object furthest away (2D space) from current brush
Cache OrganizationCache Lookup PolicyCache Replacement PolicyComputation of Remainder Queries
33
b4b1
IH
I
Distance Replacement Policy
G
FB
E
(0,0) (1,0)
(0,1)
Cache Contents
Current Brush
b1
b2
b3
b2b3
Brush Store
C D G H I
JFB
E
(0,0) (1,0)Database Contents
(0,1) A A
FE
G
B
Cache Contents
A
Current Brush
empty
selectedoccupied
b4
Cache OrganizationCache Lookup PolicyCache Replacement PolicyComputation of Remainder Queries
34
Computation of Remainder Queries
For each user request cache may contain: All nodes requested A subset of nodes requested None of nodes requested
G H
F
E
(0,0) (1,0)
(0,1)
Cache Contents
BrushRemainder Brush
A
empty
selectedoccupied
Cache OrganizationCache Lookup PolicyCache Replacement PolicyComputation of Remainder Queries
35
G
J
Computation of Remainder Queries
F
E
(0,1)
Cache Contents
Remainder Brush
(0,0) (1,0)
empty
selectedoccupied
Current Brush
Focus extents (e1,e2) of brush define interval Horizontal extents of cached nodes also form an
interval Remainder query consists of a set of remainder
brushes Remainder brush: Part of brush interval not occupied by
cache nodesA
e1 e2
Cache OrganizationCache Lookup PolicyCache Replacement PolicyComputation of Remainder Queries
36
Prefetcher [Doshi:03]
Motivation
Presence of idle timePredictable user movements
Prefetching
Prefetcher
PredictionModel
User
GUI
Front End
User Requests
Working Model:
Aim: Predict and prefetch future user requests into cache Increase hit ratio or minimize latency
Cache Manager
User Log
37
Directional Prefetcher
Prediction Model Uses recent history of user requests Prefetches in direction of last user
movement
e2t
Direction Direction Strategy
e2 t+1 e2
Prefetch
38
System Architecture
BackendController
Direction Prefetcher
Loader
Cache Manager
Delta Calculator
Request
Answer
PrefetchController
Cache Index
LRU
Cache Memory Rep. Policy
Distance
Spatial Index
Seq. Scan
query
Start/Stop
Prefetch Request
Labeling Hierarchical Data
Flat Data
Offline processDatabase
SpatialIndex
User
GUI
Front End
Start/Stop
Request
data
Cached Nodes
query
Delta query
CacheLookup
Answer
39
System Implementation
Implemented as backend to XmdvTool 6.0 Language: C++ Database: Oracle with Oracle Spatial Extension Libraries:
Spatial Index Library (UC Riverside) OTL (Oracle.. Template library) ZThread
40
Evaluation
Goal: Effectiveness of Proposed Techniques in Isolation and in
Combination Workloads:
Real Datasets D1, out5d, size = 20,000, dimensions =5 D2, uvw, flow simulation data, size = 200,000, dimensions
= 6
Input A set of 4 ,1/2 hr. real user traces collected in
[Doshi:2003apr] for dataset D1. A set of 4, 1/2 hr. synthetic user traces for dataset D2
User Trace Sequence of user requests. Each user request (position of SBB, time)
41
Evaluation Metrics Latency for User Trace
Latency Reduction Ratio (lrr)
N
ii
N
ii
T
Llatency
1
1
base
base
Latency
LatencyLatencylrr
Base Configuration
• No Index at the database
• Li = Latency for request i.
• Ti = Number of nodes in
request i
47
Experimental Results: Brief Summary
Spatial Index on the database used alone lrr 33% for Data Set D1 lrr 72% for Data Set D2
Cache lrr 58% for Data Set D1 (Cache Size = 10%) lrr 94% for Data Set D2 (Cache Size = 2%)
Comparison of Replacement Policies Distance replacement policy performs as well or better than LRU Increase in hit ratio 7% , Increase in lrr 2% for Data Set D2
Main Memory Index We need spatial index structures that support high update rates. (e.g. LR-
Tree [Bozanis:2003])
Prefetcher and Cache lrr 63% for Data Set D1 lrr 96% for Data Set D2
48
Related Work Visualization-database integrated systems
ADR [Kurc:2001] Tioga [Stonebaker:1993] USD [Johnson:1992]
Caching Semantic Caching [keller:1996] or Predicate Caching [dar:1996]
Hierarchy Encoding Nested Interval Method [Celko:2004] Dietz’s numbering scheme [dietz:1982] Dewey Order Encoding [tatxmlorder:2002]
49
Conclusions Hierarchy encoding technique
Maps tree structures to 2 dimensional spaces Maps visual exploration operations to spatial
range queries
Designed cache to reduce response time Replacement Policy: Distance or LRU Cache Lookup: Sequential or Spatial Index
Integrated direction-based prefetcher Implemented in free-ware XMDV Tool Conducted a performance study
50
References[Doshi:2003] P. Doshi et al. Prefetching for Visual Data Exploration [Doshi:2003apr] P. Doshi et al. A strategy selection framework for adaptive prefetching
in data visualization[Bozanis:2003] P. Bozanis et al. LR-Tree: a logarithmic decomposable spatial index
method[Celko:2004] J. Celko. Joe Celko’s Trees and Hierarchies in SQL for Smarties[Teuhola:1996] J. Teuhola. Path signatures to speed up recursion in relational
databases[Stonebaker:1993] M. Stonebraker et al. Providing data management support for
scientific visualization applications[dar:1996] S. Dar et al. Semantic Data Caching and Replacement[keller:1996] A.M. Keller et al. A predicated based caching scheme for client-server
database architectures.[Kurc:2001] T. Kurc et al. Exploration and visualization of large datasets with the active
data repository[Johnson:1992] M. Goldner et al. Usd- a database management system for scientific
research[Fua:1999] Y.H. Fua et al. Navigating hierarchies with structure-based brushes[dietz:1982] P.F. Dietz, Maintaining order in a linked list[tatxmlorder:2002] I. Tatarinov et al. Storing and Querying Ordered {XML} Using a
Relational Database System[Stroe:2000] I. Stroe. Scalable Visual Hierarchy Exploration