polaris query, analysis, and visualization of large hierarchical relational databases pat hanrahan...

31
Polaris Query, Analysis, and Visualization of Large Hierarchical Relational Databases Pat Hanrahan With Chris Stolte and Diane Tang Computer Science Department Stanford University

Post on 20-Dec-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Polaris

Query, Analysis, and Visualization of

Large Hierarchical Relational Databases

Pat Hanrahan

With Chris Stolte and Diane Tang

Computer Science Department

Stanford University

Motivation

Large databases have become very common

Corporate data warehouses Amazon, Walmart,…

Scientific projects: Human Genome Project

Sloan Digital Sky Survey

Need tools to extract meaning from these databases

Related Work

Formalisms for graphics Bertin’s “Semiology of Graphics” Mackinlay’s APT Roth et al.’s Sage and SageBrush Wilkinson’s “Grammar of Graphics”

Visual exploration of databases DeVise DataSplash/Tioga-2

Visualization and data mining SGI’s MineSet IBM’s Diamond

Formalism

Polaris Formalism

UI interpreted as visual specification that defines:

Table configuration

Type of graphic in each pane

Encoding of data as visual properties of marks

Data transformations and queries

Schema

MarketStateYearQuarterMonthProduct TypeProduct

ProfitSalesPayrollMarketingInventoryMarginCOGS...

Ordinal fields(categorical)

Quantitative fields(measures)

Coffee chain data[Visual Insights]

Polaris Visual Encodings

Principle of Importance Ordering: Encode the most important

information in the most effective way [Cleveland & McGill]

The Pivot Table Interface

Common interface to statistical packages/Excel

Cross-tabulations

Simple interface based on drag-and-drop

Data Cubes

Structure relation as n-dimensional cube

Each cell aggregatesall measures for those dimensions

Each cube axiscorresponds to a dimension in the relation

Table Algebra: Operands

Ordinal fields: interpret domain as a set that partitions table into rows and columns:

Quarter = {(Qtr1),(Qtr2),(Qtr3),(Qtr4)}

Quantitative fields: treat domain as single element set and encode spatially as axes:

Profit = {(Profit)}

Concatenation (+) Operator

Ordered union of two sets

Quarter + ProductType

= {(Qtr1),(Qtr2),(Qtr3),(Qtr4)}+{(Coffee),(Espresso)}

= {(Qtr1),(Qtr2),(Qtr3),(Qtr4),(Coffee),(Espresso)}

Profit + Sales

= {(Profit),(Sales)}

Cross () Operator

Direct-product of two sets

Quarter ProductType =

{(Qtr1,Coffee), (Qtr1, Tea), (Qtr2, Coffee), (Qtr2, Tea),

(Qtr3, Coffee), (Qtr3, Tea), (Qtr4, Coffee), (Qtr4,Tea)}

ProductType Profit =

SQL Dataflow

Notes Aggregation operators applied after sort Only one layer is shown; additional z-sort

Relational Table Tuples in Panes Marks in Panes

Sort

Multiscale Visualization

Hierarchical Structure

Challenge: these databases are very large

Queries/Vis should not require all the records

Augment database with hierarchical structure

Provide meaningful levels of abstraction

Derived from domain or clustering

Provides metadata (missing data for context)

Hierarchies and Data Cubes

Each dimension in the cube is structured as a tree

Each level in tree corresponds to level of detail

Schema: Star Schema

StateMonthProductProfitSalesPayrollMarketingInventoryMargin...

Measures

LocationMarketState

TimeYearQuarterMonthProducts

Product TypeProduct Name

Fact tableExistence Table

Generalizations

• Snowflake schemas

• Lattices (DAGs)

Categorical Hierarchies

Quarter Month

Direct product of two sets

Would create twelve entries for each quarter, i.e. (Qtr1, December)

Quarter / Month

Based on tuples in database not semantics

Would only create three entries per quarter

Can be expensive to compute

Quarter . Month

Based on tuples in existence tables (not db)

Cartographic GeneralizationCanterbury and East Kent

1:50,000 1:625,000

Generalization: Techniques

Selection

Simplification

Exaggeration

Regularization

Displacement

Aggregation

Summary

Polaris

Spreadsheet or table-based displays

Simple drag-and-drop interface

Built on a formalism that allows algebraic manipulation of visual mapping of tuples to marks

Multiscale visualizations using data and visual abstraction

Connects to SQL/MDX servers

See http://www.graphics.stanford.edu/projects/polaris

Future Work

Articulate full-set of multiscale design patterns

Transition between levels of detail

Develop system infrastructure for browsing VLDB

Support layers/lenses/linking with tuple flow

Device independence through graphical encodings

Extend formalism to 3D

Couple scientific and information visualization