graph olap: towards online analytical processing on graphs

40
Graph OLAP: Towards Online Analytical Processing on Graphs Chen Chen, Xifeng Yan, Feida Zhu, Jiawei Han, Philip S. Yu University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center University of Illinois at Chicago

Upload: derica

Post on 25-Feb-2016

51 views

Category:

Documents


0 download

DESCRIPTION

Graph OLAP: Towards Online Analytical Processing on Graphs. Chen Chen , Xifeng Yan, Feida Zhu, Jiawei Han, Philip S. Yu University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center University of Illinois at Chicago. Outline. Motivation Framework Efficient Computation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Graph OLAP: Towards Online Analytical Processing on Graphs

Graph OLAP: Towards Online Analytical Processing on GraphsChen Chen, Xifeng Yan, Feida Zhu, Jiawei Han,

Philip S. YuUniversity of Illinois at Urbana-Champaign

IBM T. J. Watson Research CenterUniversity of Illinois at Chicago

Page 2: Graph OLAP: Towards Online Analytical Processing on Graphs

OutlineMotivationFrameworkEfficient ComputationExperimentsConclusion

Page 3: Graph OLAP: Towards Online Analytical Processing on Graphs

Online Analytical ProcessingJim Gray, 1997OLAP as a powerful analytical tool

Page 4: Graph OLAP: Towards Online Analytical Processing on Graphs

The Usefulness of OLAPMulti-dimensional

Different perspectivesMulti-level

Different granularitiesCan we offer roll-up/drill-down and slice/dice

on graph data?Traditional OLAP cannot handle this, because

they ignore links among data objects

Page 5: Graph OLAP: Towards Online Analytical Processing on Graphs

The Prevalence of GraphsChemical compounds, computer vision

objects, circuits, XMLEspecially various information networks

Biological networksBibliographic networksSocial networksWorld Wide Web (WWW)

Page 6: Graph OLAP: Towards Online Analytical Processing on Graphs

ApplicationsWWW

>= 3 billion nodes, >= 50 billion arcsFacebook

>= 100 million active usersCombining topological structures and

node/edge attributesGreat challenge to view and analyze them

We propose Graph OLAP to tackle this issue

Page 7: Graph OLAP: Towards Online Analytical Processing on Graphs

Scenario #1A bibliographic

networkThe collaboration

patterns among researchers for SIGMOD 2004

Page 8: Graph OLAP: Towards Online Analytical Processing on Graphs
Page 9: Graph OLAP: Towards Online Analytical Processing on Graphs

Scenario #2

Page 10: Graph OLAP: Towards Online Analytical Processing on Graphs

OutlineMotivationFramework

Data ModelTwo types of Graph OLAPDimension, Measure and OLAP operations

Efficient ComputationExperimentsConclusion

Page 11: Graph OLAP: Towards Online Analytical Processing on Graphs

Data ModelWe have a collection of network snapshots G = {G1, G2, . . . , GN}

Each snapshot Gi = (I1,i, I2,i, . . . , Ik,i; Gi)I1,i, I2,i, . . . , Ik,i are k informational attributes

describing the snapshot as a wholeGi = (Vi, Ei) is an attributed graph, with

attributes attached with its nodes Vi and edges Ei

Since G1, G2, . . . , GN only represent different observations of a network, V1, V2, . . . , VN actually correspond to the same set of objects

Page 12: Graph OLAP: Towards Online Analytical Processing on Graphs

Two Types of OLAPInformational OLAP (abbr. I-OLAP)Topological OLAP (abbr. T-OLAP)

Page 13: Graph OLAP: Towards Online Analytical Processing on Graphs

Informational OLAPDimensions come

from informational attributes attached at the whole snapshot level, so-called Info-Dims

e.g., scenario #1

Page 14: Graph OLAP: Towards Online Analytical Processing on Graphs

I-OLAP CharacteristicsOverlay multiple pieces of informationDo not change the objects whose interactions

are being looked atIn the underlying snapshots, each node is a

researcherIn the summarized view, each node is still a

researcher

Page 15: Graph OLAP: Towards Online Analytical Processing on Graphs

Topological OLAPDimensions come from the node/edge

attributes inside individual networks, so-called Topo-Dims

e.g., scenario #2

Page 16: Graph OLAP: Towards Online Analytical Processing on Graphs

T-OLAP CharacteristicsZoom in/Zoom outNetwork topology changed: “generalized”

nodes and “generalized” edgesIn the underlying network, each node is a

researcherIn the summarized view, each node becomes an

institute that comprises multiple researchers

Page 17: Graph OLAP: Towards Online Analytical Processing on Graphs

Measures in Graph OLAPMeasure is an aggregated graph

I-aggregated graphT-aggregated graphOther measures like node count, average

degree, etc. can be treated as derivedGraph plays a dual role

Data sourceAggregate measure

Page 18: Graph OLAP: Towards Online Analytical Processing on Graphs

Generality of the FrameworkMeasures could be complex

e.g., maximum flow, shortest path, centralityCombine I-OLAP and T-OLAP into a hybrid

case

Page 19: Graph OLAP: Towards Online Analytical Processing on Graphs

Graph OLAP OperationsGraph I-OLAP Graph T-OLAP

Roll-up

Overlay multiple snapshots to form a higher-level summary via I-aggregated graph

Shrink the topology and obtain a T-aggregated graph that represents a compressed view, whose topological elements (i.e., nodes and/or edges) have been merged and replaced by corresponding higher-level ones

Drill-down

Return to the set of lower-level snapshots from the higher-level overlaid (aggregated) graph

A reverse operation of roll-up

Slice/dice

Select a subset of qualifying snapshots based on Info-Dims

Select a subgraph of the network based on Topo-Dims

Page 20: Graph OLAP: Towards Online Analytical Processing on Graphs

OutlineMotivationFrameworkEfficient Computation

Measure classificationOptimizationsConstraint pushing

ExperimentsConclusion

Page 21: Graph OLAP: Towards Online Analytical Processing on Graphs

Two Categories of StrategiesTop-down

Generalized cells laterHow to combine and leverage intermediate

results?Bottom-up

Generalized cells firstHow to early-stop?

Page 22: Graph OLAP: Towards Online Analytical Processing on Graphs

Measure ClassificationHow to combine and leverage intermediate

results?Distributive

The computation of high-level cells can be directly built on low-level cells

Algebraic Not distributive, but can be easily derived from

several distributive measuresHolistic

Neither distributive nor algebraic

Page 23: Graph OLAP: Towards Online Analytical Processing on Graphs

ExamplesDistributive: collaboration frequency

Use distributiveness to drive computation up the cuboid lattice

Algebraic: maximum flowWill prove laterSemi-distributive

Holistic: centralityNeed to go down to the raw data and start

from scratch

Page 24: Graph OLAP: Towards Online Analytical Processing on Graphs

OptimizationsSpecial measures may have special

properties that can help optimize the calculations

We discuss two of them here, with regard to I-OLAPLocalizationAttenuation

Page 25: Graph OLAP: Towards Online Analytical Processing on Graphs

LocalizationDuring computation, only a neighborhood of

the networks needs to be consultede.g., the collaboration frequency of “R.

Agrawal” and “R.Srikant” for [sigmod, all-years] only depends on their collaboration frequencies in each SIGMOD conferences

Perfect (i.e., 0-neighborhood) localizationk-neighborhood is less ideal, but still useful

e.g., # of common friends shared by “R. Agrawal” and “R.Srikant”

Page 26: Graph OLAP: Towards Online Analytical Processing on Graphs

AttenuationConsider the transporting capability (i.e.,

maximum flow) from source S to destination TMultiple transportation networks, each one is

operated by a separate companyWith regard to I-OLAP, each network is a

“snapshot”, and overlaying more than one snapshots means to share link capacities among companies

Page 27: Graph OLAP: Towards Online Analytical Processing on Graphs

AttenuationData graph C

Node: citiesEdge: capacity of a link

Measure graph FNode: citiesEdge: when maximum flow is transmitted, the

quantity that passes through a link

Page 28: Graph OLAP: Towards Online Analytical Processing on Graphs

AttenuationMaximum flow is algebraic

F can be derived from C Just run the maximum flow algorithm

The capacity graph C is obviously distributiveLemma

Let F be a flow in C and let CF be its residual graph, where residual means that CF = C - F, then F′ is a maximum flow in CF if and only if F + F′ is a maximum flow in C

Page 29: Graph OLAP: Towards Online Analytical Processing on Graphs

AttenuationConsider two snapshots that are overlaid

Maximum flow F1, F2 already calculated from C1, C2

Without attenuation Compute the overall maximum flow F from C1 + C2

With attenuation Take F1 + F2 as basis Compute the residual maximum flow F′ from (C1 - F1)

+ (C2 - F2), and augment it onto F1 + F2

Thus, our input attenuates from C1 + C2 to (C1 + C2 ) - (F1 + F2 ), which substantially decreases the efforts

Page 30: Graph OLAP: Towards Online Analytical Processing on Graphs

Constraint PushingIceberg graph cube

Partial materializationSatisfying some interestingness requirement

Push the constraintsAnti-monotone

e.g., maximum flow |f| ≥ δ|f|

Monotone e.g., diameter d ≥ δd

Page 31: Graph OLAP: Towards Online Analytical Processing on Graphs

OutlineMotivationFrameworkEfficient ComputationExperimentsConclusion

Page 32: Graph OLAP: Towards Online Analytical Processing on Graphs

OLAP a Bibliographic NetworkWe get the coauthorship data from DBLPMeasure

Information CentralityTwo Info-Dims

Area Database (DB): PODS/SIGMOD/VLDB/ICDE/EDBT Data Mining (DM): ICDM/SDM/KDD/PKDD Information Retrieval (IR): SIGIR/WWW/CIKM

Time

Page 33: Graph OLAP: Towards Online Analytical Processing on Graphs

OLAP a Bibliographic Network

Page 34: Graph OLAP: Towards Online Analytical Processing on Graphs

EfficiencyA test that computes maximum flow as the

measureSynthetically generate flow networks

Details in the paper, with each “snapshot” representing an individual player in the transportation industry

Like the Multi-Way method, calculate low-level cells before merging them into high-level onesOne takes advantage of the attenuation

heuristicThe other does not

Page 35: Graph OLAP: Towards Online Analytical Processing on Graphs

Efficiency

Page 36: Graph OLAP: Towards Online Analytical Processing on Graphs

OutlineMotivationFrameworkEfficient ComputationExperimentsConclusion

Page 37: Graph OLAP: Towards Online Analytical Processing on Graphs

ConclusionWe propose a Graph OLAP framework to

perform multi-dimensional, multi-level analysis on network dataMeasure is an aggregated graphInformational/Topological dimensions lead to I-

OLAP, T-OLAP

Page 38: Graph OLAP: Towards Online Analytical Processing on Graphs

ConclusionMainly focusing on I-OLAP, we discuss how a

graph cube can be efficiently computed and materializeddistributive, algebraic, holisticOptimizations: localization, attenuationConstraint pushing

Page 39: Graph OLAP: Towards Online Analytical Processing on Graphs

Future WorksTechnical issues for T-OLAPSelective drilling and discovery-driven

InfoNet-OLAP

Page 40: Graph OLAP: Towards Online Analytical Processing on Graphs

Thank You!