nishant mehta, elke a. rundensteiner and matt ward computer science department

Post on 16-Jan-2016

28 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Hierarchy Navigation Framework: Supporting Scalable Interactive Exploration over Large Databases. Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department Worcester Polytechnic Institute IDEAS’05 Thank you to NSF for several IDM grants for XMDV project. - PowerPoint PPT Presentation

TRANSCRIPT

1

Hierarchy Navigation Framework: Supporting Scalable Interactive

Exploration over Large Databases

Nishant Mehta, Elke A. Rundensteiner and Matt WardComputer Science DepartmentWorcester Polytechnic Institute

IDEAS’05

Thank you to NSF for several IDM grants for XMDV project.

3

XmdvTool: Multivariate Data Visualization

Example

MPG Cyli. HP Wt.18 8 130 3504

17 8 132 3700

.

.

.

.

.

.

.

.

.

.

.

.

40 2 100 2500

Cars Data Set

Parallel coordinate display

Dataset with 4096 points in XmdvTool 6.0

18

8

130 3504

4

Hierarchical Displays [Fua:99]

C

G

Base Data Points

MPG Cyli. HP Wt.

C 6 8 130 3504

D 4 8 132 3000

G 10 6 110 2800

H 12 2 70 2100

I 15 2 80 2200

J 15 2 100 2500

B 5 8 131 3252

F 12.3 3.33 86.66 2366.6

Cars Data Set

Structure-based brush components: b- level of detaild- focus areae- focus extents

C D

B

A

E

F

G I

J

H

6

42

30

00

0 0

A 10.33 4.66 103.66 2684

E 13 33 90 2400

D

JH

I

0

5

Hierarchical Displays

6

Problems: Hierarchical Display

Achieved: Screen space solution to clutter

problem

But Data handing problem remains …

Cluster tree size greater than initial tree Cluster tree may not fit into main memory Structure based brush semantics involve

recursive searches over cluster tree

7

Goal Overall Goal:

Scale hierarchical displays to support navigation over large hierarchies

Subgoals : Support navigation over large-scale

persistent data Store hierarchies on disk Map navigation operations to efficient queries

Meet interactive response requirements

8

Support navigation operations over large scale persistent data

Overview of Approach

Meet interactive response requirements

HierarchyEncoding

Caching

Prefetching

Spatial Indexing

9

Hierarchy EncodingProblem : Structure-based brush Selection semantics involve recursive search Recursive search over secondary storage is slow

Solution: Hierarchy encoding Push recursive processing into precomputation step Precompute label for each node in hierarchy Map recursive search to equivalent non-recursive one

LabelingHierarchical

DataDatabase

Hierarchy Encoding

10

Structure-Based Brush Semantics [Fua:99]

Horizontal Selection Subtree (e1, e2)

Vertical Selection Level of detail (lod)

C D

B

A

E

F

G I

J

H

0.6

0.4

00

000

0.2

0 0

Node selection based on 2 steps:

11

AimSelect subtree that user is interested in viewing

Approach Brush focus extents (e1,e2), select set of base points. Propagate selection: select parent(n) if n is selected

Horizontal Selection

Selected Clusters

Selected Leaves

C D

B

A

E

F

G I

J

H

0.6

0.4

0

00

0

0 0

0.2

0.3

(e1,e2) = (2/6, 11/12) , lod=0.4

12

Non-Recursive Horizontal Selection

Offline Precompute intervals for each node (hmin, hmax) Interval of parent includes interval of childOnline Search for nodes that intersect brush interval (e1,e2)

C D

B

A

E

F

G I

J

H

(0,1/6) (1/6,2/6)

(2/6,3/6)(3/6,4/6)

(4/6,5/6)

(5/6,1)

(0,2/6)

(2/6,5/6)

(2/6,1)

(0,1)

0.6

0.5

0

0

0.3

0.2

00

00

(e1,e2) = (2/6, 11/12) , lod=0.4

13

Vertical Selection Aim

Select points at desired lod (lod handle of SBB) Approach

Explore each branch starting at root to find node: lod(n) <= lod(brush)

C D

B

A

E

GI

J

H

0.6

0.2 0.5

0

0

0

0

0.30 0 F

SBB: (e1,e2) = (2/6, 11/12) , lod=0.4

lod=0.4

14

Non-Recursive Vertical Selection Node n satisfies vertical selection criteria iff:

C D

B

A

E

F

G I

J

H

0.2 0.5

00

0

0.30 0

lod(brush) = 0.4

0.2

,0.

6

0.5

,0.

6

0,0

.5

0

0,0

.3

0,0

.3

0,0

.3

0,0

.2

0,0

.2

0.3

,0.5

0.6

lod(n) <= lod(brush) < lod(parent(n))

Each node n, has extents (vmin,vmax)

vmin<= lod(brush) < vmax

0.6

,

SBB: (e1,e2) = (2/6, 11/12) , b=0.4

15

Non-Recursive Selection

C D

B

A

E

F

G I

J

H

0.6

,

0.2

,0.

6

0.5

,0.

6

0,0

.5

0,0

.3

0,0

.3

0,0

.3

0,0

.2

0,0

.2

0.3

,0.5

(0,1)

(2/6,1)

(4/6,5/6)(3/6,4/6)

(2/6,3/6)

(2/6,5/6)`(0,1/6)

(0,2/6)`

(1/6,2/6) (5/6,1)

Selects all nodes that satisfy: hmin <= e2 and hmax >= e1 vmin <= lod(brush) < vmax

SBB: (e1,e2) = (2/6, 11/12) , lod=0.4

16

2D Hierarchy Map

0.6

0.2

0.5

03/6 4/6 5/6 1

0.3

C D G H I

B

J

F

E

1.0

1/6

Brush

A

C D

B

A

E

F

G I

J

H

0.6

,

0.2

,0.

6

0.5

,0.

6

0,0

.5

0,0

.3

0,0

.3

0,0

.3

0,0

.2

0,0

.2

0.3

,0.5

(0,1)

(2/6,1)

(4/6,5/6)(3/6,4/6)

(2/6,3/6)

(2/6,5/6)(0,1/6)

(0,2/6)

(1/6,2/6)(5/6,1)

SBB: (e1,e2) = (2/6, 11/12) , lod=0.4

e2e1

lod

17

Properties of 2D Hierarchy Map

Progressive Tree Structure Space Filling Non-Overlapping

BF

E

A

C D G H

B

J

F

E

I

1.0

0.6

0.5

0.30.2

011/6 2/6 3/6 4/6 5/6

18

Navigation operations in 2D Hierarchy Map

0.6

0.2

0.5

02/6 3/6 4/6 5/6 1

0.3

C D G H I

B

J

F

E

1.0

1/6

Brush

A

selected

20

Spatial Index Q searches for nodes intersecting structure based

brush Q is spatial range query over spatial objects

2D Hierarchy Map

01/6 2/6 3/6 4/6 5/6 1

Brush0.6

0.2

0.5

BF

E

A

0.3

C D G H

B

J

F

E

I

1.0

Spatial Index (R-Tree index) can help faster searches

26

Next

Caching and Prefetching

27

Presence of idle timePredictable of user movements (User Inertia)

Locality of explorationContiguous queries have similar answers

User Trace Characteristics [Doshi:2003]

Caching

Prefetching

0.6

0.2

0.5

02/6 3/6 4/6 5/6

1

0.3

C D G H I

B

J

F

E

1/6

BrushA

28

Cache Design Purpose

Minimize system latency

Design Issues Cache Organization Cache Lookup Policy Cache Replacement Policy Computation of Remainder Queries

29

Cache Organization Contiguous chunk of main

memory that stores recently fetched nodes

Each node has a descriptor Horizontal and Vertical Extents

GF

H

EA

2D Hierarchy Map in database 2D Hierarchy Map of Cache Contents

C D G H I

JFB

E

(0,0) (1,0)

(0,1)

G H

F

E

(0,0)

(1,0)

(0,1)

A A

emptyoccupied

Cache OrganizationCache Lookup PolicyCache Replacement PolicyComputation of Remainder Queries

30

Cache Lookup

Cache Lookup Sequential scan, or Main memory spatial index

(1,0)

Brush

G H

F

E

(0,0)

(0,1) A Main Memory Index

Advantage Faster cache look up

Disadvantage Frequent index updates

empty

selectedoccupied

Aim: Find nodes in cache that lie in current brush

Cache OrganizationCache Lookup PolicyCache Replacement PolicyComputation of Remainder Queries

31

Cache Replacement Policy

Locality of Exploration

Spatial Locality

Distance

Temporal Locality

Contiguous queries have similar answers

LRU

Aim: Make room for new nodes Replace node with least probability of being

referenced. Approach

Exploit general user trace characteristics

Cache OrganizationCache Lookup PolicyCache Replacement PolicyComputation of Remainder Queries

32

Distance Replacement Policy

Realization : Maintain brush store Select victim brush with max distance from current brush Replace individual cached nodes in victim brush

Distance: Length of line segment that joins center of 2 brushes.

Idea Replace object furthest away (2D space) from current brush

Cache OrganizationCache Lookup PolicyCache Replacement PolicyComputation of Remainder Queries

33

b4b1

IH

I

Distance Replacement Policy

G

FB

E

(0,0) (1,0)

(0,1)

Cache Contents

Current Brush

b1

b2

b3

b2b3

Brush Store

C D G H I

JFB

E

(0,0) (1,0)Database Contents

(0,1) A A

FE

G

B

Cache Contents

A

Current Brush

empty

selectedoccupied

b4

Cache OrganizationCache Lookup PolicyCache Replacement PolicyComputation of Remainder Queries

34

Computation of Remainder Queries

For each user request cache may contain: All nodes requested A subset of nodes requested None of nodes requested

G H

F

E

(0,0) (1,0)

(0,1)

Cache Contents

BrushRemainder Brush

A

empty

selectedoccupied

Cache OrganizationCache Lookup PolicyCache Replacement PolicyComputation of Remainder Queries

35

G

J

Computation of Remainder Queries

F

E

(0,1)

Cache Contents

Remainder Brush

(0,0) (1,0)

empty

selectedoccupied

Current Brush

Focus extents (e1,e2) of brush define interval Horizontal extents of cached nodes also form an

interval Remainder query consists of a set of remainder

brushes Remainder brush: Part of brush interval not occupied by

cache nodesA

e1 e2

Cache OrganizationCache Lookup PolicyCache Replacement PolicyComputation of Remainder Queries

36

Prefetcher [Doshi:03]

Motivation

Presence of idle timePredictable user movements

Prefetching

Prefetcher

PredictionModel

User

GUI

Front End

User Requests

Working Model:

Aim: Predict and prefetch future user requests into cache Increase hit ratio or minimize latency

Cache Manager

User Log

37

Directional Prefetcher

Prediction Model Uses recent history of user requests Prefetches in direction of last user

movement

e2t

Direction Direction Strategy

e2 t+1 e2

Prefetch

38

System Architecture

BackendController

Direction Prefetcher

Loader

Cache Manager

Delta Calculator

Request

Answer

PrefetchController

Cache Index

LRU

Cache Memory Rep. Policy

Distance

Spatial Index

Seq. Scan

query

Start/Stop

Prefetch Request

Labeling Hierarchical Data

Flat Data

Offline processDatabase

SpatialIndex

User

GUI

Front End

Start/Stop

Request

data

Cached Nodes

query

Delta query

CacheLookup

Answer

39

System Implementation

Implemented as backend to XmdvTool 6.0 Language: C++ Database: Oracle with Oracle Spatial Extension Libraries:

Spatial Index Library (UC Riverside) OTL (Oracle.. Template library) ZThread

40

Evaluation

Goal: Effectiveness of Proposed Techniques in Isolation and in

Combination Workloads:

Real Datasets D1, out5d, size = 20,000, dimensions =5 D2, uvw, flow simulation data, size = 200,000, dimensions

= 6

Input A set of 4 ,1/2 hr. real user traces collected in

[Doshi:2003apr] for dataset D1. A set of 4, 1/2 hr. synthetic user traces for dataset D2

User Trace Sequence of user requests. Each user request (position of SBB, time)

41

Evaluation Metrics Latency for User Trace

Latency Reduction Ratio (lrr)

N

ii

N

ii

T

Llatency

1

1

base

base

Latency

LatencyLatencylrr

Base Configuration

• No Index at the database

• Li = Latency for request i.

• Ti = Number of nodes in

request i

47

Experimental Results: Brief Summary

Spatial Index on the database used alone lrr 33% for Data Set D1 lrr 72% for Data Set D2

Cache lrr 58% for Data Set D1 (Cache Size = 10%) lrr 94% for Data Set D2 (Cache Size = 2%)

Comparison of Replacement Policies Distance replacement policy performs as well or better than LRU Increase in hit ratio 7% , Increase in lrr 2% for Data Set D2

Main Memory Index We need spatial index structures that support high update rates. (e.g. LR-

Tree [Bozanis:2003])

Prefetcher and Cache lrr 63% for Data Set D1 lrr 96% for Data Set D2

48

Related Work Visualization-database integrated systems

ADR [Kurc:2001] Tioga [Stonebaker:1993] USD [Johnson:1992]

Caching Semantic Caching [keller:1996] or Predicate Caching [dar:1996]

Hierarchy Encoding Nested Interval Method [Celko:2004] Dietz’s numbering scheme [dietz:1982] Dewey Order Encoding [tatxmlorder:2002]

49

Conclusions Hierarchy encoding technique

Maps tree structures to 2 dimensional spaces Maps visual exploration operations to spatial

range queries

Designed cache to reduce response time Replacement Policy: Distance or LRU Cache Lookup: Sequential or Spatial Index

Integrated direction-based prefetcher Implemented in free-ware XMDV Tool Conducted a performance study

50

References[Doshi:2003] P. Doshi et al. Prefetching for Visual Data Exploration [Doshi:2003apr] P. Doshi et al. A strategy selection framework for adaptive prefetching

in data visualization[Bozanis:2003] P. Bozanis et al. LR-Tree: a logarithmic decomposable spatial index

method[Celko:2004] J. Celko. Joe Celko’s Trees and Hierarchies in SQL for Smarties[Teuhola:1996] J. Teuhola. Path signatures to speed up recursion in relational

databases[Stonebaker:1993] M. Stonebraker et al. Providing data management support for

scientific visualization applications[dar:1996] S. Dar et al. Semantic Data Caching and Replacement[keller:1996] A.M. Keller et al. A predicated based caching scheme for client-server

database architectures.[Kurc:2001] T. Kurc et al. Exploration and visualization of large datasets with the active

data repository[Johnson:1992] M. Goldner et al. Usd- a database management system for scientific

research[Fua:1999] Y.H. Fua et al. Navigating hierarchies with structure-based brushes[dietz:1982] P.F. Dietz, Maintaining order in a linked list[tatxmlorder:2002] I. Tatarinov et al. Storing and Querying Ordered {XML} Using a

Relational Database System[Stroe:2000] I. Stroe. Scalable Visual Hierarchy Exploration

top related