graph analytics: complexity, scalability, and...

34
1 GABB: May 23, 2016 Graph Analytics: Complexity, Scalability, and Architectures Peter M. Kogge McCourtney Prof. of CSE Univ. of Notre Dame IBM Fellow (retired) Please Sir, I want more

Upload: others

Post on 06-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Graph Analytics: Complexity, Scalability, and Architecturesweb.cse.ohio-state.edu/~lu.932/hpbdc2017/slides/hpbdc17-peter.pdf · Complexity, Scalability, and Architectures Peter M

1 GABB: May 23, 2016

Graph Analytics: Complexity, Scalability,

and Architectures Peter M. Kogge

McCourtney Prof. of CSE Univ. of Notre Dame IBM Fellow (retired)

Please Sir, I want more

Page 2: Graph Analytics: Complexity, Scalability, and Architecturesweb.cse.ohio-state.edu/~lu.932/hpbdc2017/slides/hpbdc17-peter.pdf · Complexity, Scalability, and Architectures Peter M

2 GABB: May 23, 2016

Thesis

•  Graph computation is increasing •  To date: most benchmarks are batch •  Streaming becoming more important •  This talk: Combine batch and streaming •  Emerging architectures have real promise

Page 3: Graph Analytics: Complexity, Scalability, and Architecturesweb.cse.ohio-state.edu/~lu.932/hpbdc2017/slides/hpbdc17-peter.pdf · Complexity, Scalability, and Architectures Peter M

3 GABB: May 23, 2016

Graph Kernels and Benchmarks

Page 4: Graph Analytics: Complexity, Scalability, and Architecturesweb.cse.ohio-state.edu/~lu.932/hpbdc2017/slides/hpbdc17-peter.pdf · Complexity, Scalability, and Architectures Peter M

4 GABB: May 23, 2016

Graphs

•  Graph: –  Set of objects called vertices –  Set of links called edges between vertices –  May have “properties”

•  Graph computing of increasing importance –  Social Networks –  Communication & power networks –  Recommendation systems –  Genomics –  Cyber-security

http://icensa.com/sites/default/files/styles/research_image/public/Unknown.png?itok=HfBBjbJK

https://www.researchgate.net/profile/Mehmet_Bakal/publication/266968024/figure/fig4/AS:295737989582855@1447520839376/Figure-42-A-sample-basic-retweet-graph.png

Page 5: Graph Analytics: Complexity, Scalability, and Architecturesweb.cse.ohio-state.edu/~lu.932/hpbdc2017/slides/hpbdc17-peter.pdf · Complexity, Scalability, and Architectures Peter M

5 GABB: May 23, 2016

Classes of Graph Computation

•  Characteristics of individual vertices –  E.g. “properties” such as degree

•  Characteristics of graph as a whole –  E.g. diameter, max distance, covering

•  Characteristics of pairs of vertices –  E.g. Shortest paths

•  Characteristics of subgraphs –  E.g. Connected components, spanning tree –  Similarities of subgraphs, …

Page 6: Graph Analytics: Complexity, Scalability, and Architecturesweb.cse.ohio-state.edu/~lu.932/hpbdc2017/slides/hpbdc17-peter.pdf · Complexity, Scalability, and Architectures Peter M

6 GABB: May 23, 2016

Classes of Application Computations

•  Batch: function applied to entire graph of major subgraph as it exists at some time

•  Streaming: –  Incoming sequence of small-scale updates

• New vertices or edges • Modification of a property of specific vertex

or edge • Deletions

– Sequence of localized queries

Page 7: Graph Analytics: Complexity, Scalability, and Architecturesweb.cse.ohio-state.edu/~lu.932/hpbdc2017/slides/hpbdc17-peter.pdf · Complexity, Scalability, and Architectures Peter M

7 GABB: May 23, 2016

Current Benchmark Suites Kernel Class: what class of computing kernel performs Benchmarking Efforts •  S => Streaming •  B => Batch •  B/S => Both Outputs: what is size or structure of result of kernel execution?

Conn

ectedn

ess

Path  Analysis

Centrality

Clusterin

g

Subgraph

 Isom

orph

ism

Other

Standalone

Fireho

se

Graph

500

Graph

BLAS

Graph

 Challe

nge

Graph

 Algorith

m  Platform

HPC  Graph

 Analysis

Kepn

er  &  Gilb

ert

Stinger

VAST

Graph

 Mod

ificatio

n

Compu

te  Vertex  Prop

erty

Outpu

t  Global  V

alue

Outpu

t  O(1)  Events

Outpu

t  O(|V|

)  List

Outpu

t  O(|V|

k )  List  (k>1

)

Anomaly  -­‐  Fixed  Key X S XAnomaly  -­‐  Unbounded  Key X S XAnomaly  -­‐  Two-­‐level  Key X S XBC:  Betweeness  Centrality X B B B S XBFS:  Breadth  First  Search X B B B B B B X XSearch  for  "Largest" X B X

CCW:  Weakly  Connected  Components X B B S X XCCS:    Strongly  Connected  Components X B B X

CCO:  Clustering  Coefficients X B S XCD:  Community  Detection X X S X XGC:  Graph  Contraction X B B XGP:  Graph  Partitioning X B/S B X

GTC:  Global  Triangle  Counting X B XInsert/Delete X S X

Jaccard X B/S XMIS:  Maximally  Independent  Set B B

PR:  PageRank X B XSSSP:  Single  Source  Shortest  Path X B B/S B X XAPSP:  All  pairs  Shortest  Path X B X

SI:  General  Subgraph  Isomorphism X B/STL:  Triangle  Listing X B/S X

Geo  &  Temporal  Correlation X B/S X

Kernel

Kernel  Class Benchmarking  Efforts Outputs

Page 8: Graph Analytics: Complexity, Scalability, and Architecturesweb.cse.ohio-state.edu/~lu.932/hpbdc2017/slides/hpbdc17-peter.pdf · Complexity, Scalability, and Architectures Peter M

8 GABB: May 23, 2016

A Real World App

Page 9: Graph Analytics: Complexity, Scalability, and Architecturesweb.cse.ohio-state.edu/~lu.932/hpbdc2017/slides/hpbdc17-peter.pdf · Complexity, Scalability, and Architectures Peter M

9 GABB: May 23, 2016

Real World vs. Benchmarks •  Processing more than single kernel •  Many different classes of vertices •  Many different classes of edges •  Vertices may have 1000’s of properties •  Edges may have timestamps •  Both batch & streaming are integrated

–  Batch to clean/process existing data sets, add properties –  Streaming (today) to query graph –  Streaming (tomorrow) to update graph in real-time

•  “Neighborhoods” more important than full graph connectivity

Page 10: Graph Analytics: Complexity, Scalability, and Architecturesweb.cse.ohio-state.edu/~lu.932/hpbdc2017/slides/hpbdc17-peter.pdf · Complexity, Scalability, and Architectures Peter M

10 GABB: May 23, 2016

Sample Real-World Batch Analytic (From Lexis Nexis)

•  2012: 40+ TB of Raw Data •  Periodically clean up &

combine to 4-7 TB •  Weekly “Boil the Ocean” to

precompute answers to all standard queries –  Does X have financial

difficulties? –  Does X have legal problems? –  Has X had significant driving

problems? –  Who has shared addresses

with X? –  Who has shared property

ownership with X?

Auto Insurance Co: “Tell me about giving auto policy to Jane Doe” in < 0.1sec

“Jane Doe has no indicators But she has shared multiple addresses with Joe Scofflaw Who has the following negative indicators ….”

Look up answers to precomputed queries for “Jane Doe”, and combine

Relationships

Page 11: Graph Analytics: Complexity, Scalability, and Architecturesweb.cse.ohio-state.edu/~lu.932/hpbdc2017/slides/hpbdc17-peter.pdf · Complexity, Scalability, and Architectures Peter M

11 GABB: May 23, 2016

Sample Analytic Details •  Given: 14.2+ billion records from

–  800+ million entities (people, businesses) –  100+ million addresses –  records on who has resided at what address

•  Goal: for each entity ID, find all other IDs such that –  Share at least 2 addresses in common –  Or have one address in common and “close” last name –  Matching last names requires processing to check for typos

(“Levenshtein distance”)

•  Akin to a join based on common address, with grouping and thresholding on # of join results

•  Dozens of similar analytics computed once a week on 400 node cluster

Vertices

Edges

Page 12: Graph Analytics: Complexity, Scalability, and Architecturesweb.cse.ohio-state.edu/~lu.932/hpbdc2017/slides/hpbdc17-peter.pdf · Complexity, Scalability, and Architectures Peter M

12 GABB: May 23, 2016

Sample Batch Implementation Platform: Lexis Nexis

•  Entity data kept in huge persistent tables –  Often with 1,000s of columns

•  Programming in declarative ECL •  THOR: runs “offline” on 400+ node systems

–  Batch analytic processing over large data sets –  Large distributed parallel file system –  Leaves all data sets for queries in indexed

files

•  ROXIE: runs “online” on smaller system –  User queries using output files from THOR –  Dynamically interrogate indexed files –  Can perform localized ECL on data subsets

•  No dynamic data updates

Software Architecture: https://upload.wikimedia.org/wikipedia/ commons/0/02/Fig4b_HPCC.jpg

Page 13: Graph Analytics: Complexity, Scalability, and Architecturesweb.cse.ohio-state.edu/~lu.932/hpbdc2017/slides/hpbdc17-peter.pdf · Complexity, Scalability, and Architectures Peter M

13 GABB: May 23, 2016

Execution on Today’s Architectures •  Model built to estimate usage of following

–  Bandwidth: Network, Disk, Memory –  Processing capability

•  Baseline: cluster of 400 dual-Xeon nodes •  Menu of improvement options investigated •  “Conventional” improvements

–  No one option >45% increase in performance –  Significant gains only when all applied at once

•  “Unconventional” improvements even better –  ARMs for Xeons –  2-level memory –  Computing in “3D memory”

Page 14: Graph Analytics: Complexity, Scalability, and Architecturesweb.cse.ohio-state.edu/~lu.932/hpbdc2017/slides/hpbdc17-peter.pdf · Complexity, Scalability, and Architectures Peter M

14 GABB: May 23, 2016

A Model Based on Contemporary Architecture

•  Optimal code streams data thru multiple kernels till barrier •  No one resource is consistent bottleneck •  Inter-node comm: dynamically random small message

1.E-­‐02

1.E-­‐01

1.E+00

1.E+01

1.E+02

1.E+03

1 2 3 4 5 6 7 8 9

Resources  U

sed/no

de  (sec)

Step  #Disk CPU Memory Network

Baseline:  1026s10  racks

Page 15: Graph Analytics: Complexity, Scalability, and Architecturesweb.cse.ohio-state.edu/~lu.932/hpbdc2017/slides/hpbdc17-peter.pdf · Complexity, Scalability, and Architectures Peter M

15 GABB: May 23, 2016

The Core of This Computation

as a Benchmark Kernel

Page 16: Graph Analytics: Complexity, Scalability, and Architecturesweb.cse.ohio-state.edu/~lu.932/hpbdc2017/slides/hpbdc17-peter.pdf · Complexity, Scalability, and Architectures Peter M

16 GABB: May 23, 2016

Sample Analytic Details •  Given: 14.2+ billion records from

–  800+ million entities (people, businesses) –  100+ million addresses –  records on who has resided at what address

•  Goal: for each entity ID, find all other IDs such that –  Share at least 2 addresses in common –  Or have one address in common and “close” last name –  Matching last names requires processing to check for typos

(“Levenshtein distance”)

•  Akin to a join based on common address, but with grouping and thresholding on # of join results

•  Dozens of similar analytics computed once a week on 400 node cluster

Vertices

Edges

Page 17: Graph Analytics: Complexity, Scalability, and Architecturesweb.cse.ohio-state.edu/~lu.932/hpbdc2017/slides/hpbdc17-peter.pdf · Complexity, Scalability, and Architectures Peter M

17 GABB: May 23, 2016

Neighborhoods & Jaccard Coefficients: The Essence of NORA problems

N(u) = set of neighbors of u Γ(u,v) = fraction of neighbors of u and v that are in common Γ(u,v) = |N(u) ∩ N(v)|/(N(u) U N(v)| Alternative: d(u) = # of neighbors of u ɤ(u, v) = # of common neighbors Γ(u,v) = ɤ(u, v) /(d(u)+d(v)- ɤ(u, v))

i

j

Green and Purple lead to common neighbors Blue lead to non-common neighbors

u

v

The LexisNexis shared address NORA problem is an extension of this

Page 18: Graph Analytics: Complexity, Scalability, and Architecturesweb.cse.ohio-state.edu/~lu.932/hpbdc2017/slides/hpbdc17-peter.pdf · Complexity, Scalability, and Architectures Peter M

18 GABB: May 23, 2016

Results of a Map-Reduce Batch Implementation

1.E+02

1.E+03

1.E+04

1.E+05

1.E+04 1.E+05 1.E+06 1.E+07 1.E+08

Time  (Sec)

VerticesMeasured Modeled

1.E+07

1.E+08

1.E+09

1.E+10

1.E+11

1.E+12

1.E+13

1.E+04 1.E+05 1.E+06 1.E+07 1.E+08

Coefficients

VerticesMeasured Modeled

JACS (Jaccard Coefficients / Sec) = 1.6E6*V0.26

Entire LN Analytic approx 10X faster Burkhardt “Asking Hard Graph Questions,” Beyond Watson Workshop, Feb. 2014.

RMAT matrices, average d(i) = 16, on 1000 node system, each with 12 cores & 64GB

# Coefficients grows more rapidly than # vertices

Time also grows more than linearly

Page 19: Graph Analytics: Complexity, Scalability, and Architecturesweb.cse.ohio-state.edu/~lu.932/hpbdc2017/slides/hpbdc17-peter.pdf · Complexity, Scalability, and Architectures Peter M

19 GABB: May 23, 2016

A Streaming Form: Better Match to Future Real-Time NORA

•  Assume Edges arrive in stream {(u,v)} –  Graph keeps just “largest Γ” for each vertex

•  Question: does any individual edge addition significantly change any vertex’s largest Γ –  Especially to change to cross some threshold

•  Implementation involves looking at –  Neighbors of neighbors of u (to update u’s max Γ) –  Neighbors of v (to update their peak Γ, given u now shares

neighbors)

•  Bloom filter-like heuristic can bound thresholds early •  Lots of interesting variations

P. Kogge, “Jaccard Coefficients as a Potential Graph Benchmark,” GABB, IPDPS, 2016

u

v

Page 20: Graph Analytics: Complexity, Scalability, and Architecturesweb.cse.ohio-state.edu/~lu.932/hpbdc2017/slides/hpbdc17-peter.pdf · Complexity, Scalability, and Architectures Peter M

20 GABB: May 23, 2016

Looking Forward: Converting Such Problem into Large Graphs

•  Create graph of name/address records

•  Query: Given a specific ID1 find all ID2s that meet requirements –  Start with ID1 and find all ID2 reachable via shared address –  Score each path –  Sum all path scores & pass (ID1, ID2) if > threshold

ID1 …

ID2 …

Entity ID

Last Name

Address

Page 21: Graph Analytics: Complexity, Scalability, and Architecturesweb.cse.ohio-state.edu/~lu.932/hpbdc2017/slides/hpbdc17-peter.pdf · Complexity, Scalability, and Architectures Peter M

21 GABB: May 23, 2016

Ability to do On-Demand Graph Queries Will Change Business Model

IDs Last Names

Add

ress

es

1 Start with ID, follow last names to all matching address, and then other IDs

2

3 96GB

18GB 300GB

Page 22: Graph Analytics: Complexity, Scalability, and Architecturesweb.cse.ohio-state.edu/~lu.932/hpbdc2017/slides/hpbdc17-peter.pdf · Complexity, Scalability, and Architectures Peter M

22 GABB: May 23, 2016

Canonical Graph Processing

Page 23: Graph Analytics: Complexity, Scalability, and Architecturesweb.cse.ohio-state.edu/~lu.932/hpbdc2017/slides/hpbdc17-peter.pdf · Complexity, Scalability, and Architectures Peter M

23 GABB: May 23, 2016

Property Updates

Canonical Graph Processing

Persistent Big Data/Graph Data Set

Seed Identification

Selection

Criteria

Subgraph Extraction

Seeds

Sub Graph Sub Graph Sub Graph Batch

Analytics Batch

Analytics Batch

Analytics

Local Update

Real-Time, Stream

Events

Events Graph

Properties

Batch Input

Page 24: Graph Analytics: Complexity, Scalability, and Architecturesweb.cse.ohio-state.edu/~lu.932/hpbdc2017/slides/hpbdc17-peter.pdf · Complexity, Scalability, and Architectures Peter M

24 GABB: May 23, 2016

Streaming Characteristics

•  Two kinds –  Streams of queries –  Updates to persistent data

•  Both typically localized to start with •  Streaming updates often multi-step

–  Perform update (use atomics) –  Perform some local computations –  Compare to threshold –  If threshold passed, extract some larger subset –  And perform a bigger analytic

Page 25: Graph Analytics: Complexity, Scalability, and Architecturesweb.cse.ohio-state.edu/~lu.932/hpbdc2017/slides/hpbdc17-peter.pdf · Complexity, Scalability, and Architectures Peter M

25 GABB: May 23, 2016

Observations

•  Data sets live in persistent memory •  Streaming updates trigger threshold tests •  Streaming queries result in local graph traversals •  Batch analytics used primarily for analysis & new

property computation

PropertyUpdates

Persistent Big Data/Graph Data Set

SeedIdentification

Selection

Criteria

SubgraphExtraction

Seeds

Sub GraphSub GraphSub GraphBatch

AnalyticsBatch

AnalyticsBatch

Analytics

LocalUpdate

Real-Time,Stream

Events

EventsGraph

Properties

Page 26: Graph Analytics: Complexity, Scalability, and Architecturesweb.cse.ohio-state.edu/~lu.932/hpbdc2017/slides/hpbdc17-peter.pdf · Complexity, Scalability, and Architectures Peter M

26 GABB: May 23, 2016

Emerging Architectures to Accelerate

Graph Processing

Page 27: Graph Analytics: Complexity, Scalability, and Architecturesweb.cse.ohio-state.edu/~lu.932/hpbdc2017/slides/hpbdc17-peter.pdf · Complexity, Scalability, and Architectures Peter M

27 GABB: May 23, 2016

Knight’s Landing: 2-Level Memory

•  2-level memory –  High capacity for “Persistent Graphs” –  HBM memory for “Subgraph” Working memory –  Lots of multi-threaded cores that can see all memory

https://www.micron.com/products/hybrid-memory-cube/~/media/track-2-images/content-images/content_image_knights_landing_1200_x_600.jpg?la=en

Page 28: Graph Analytics: Complexity, Scalability, and Architecturesweb.cse.ohio-state.edu/~lu.932/hpbdc2017/slides/hpbdc17-peter.pdf · Complexity, Scalability, and Architectures Peter M

28 GABB: May 23, 2016

A Novel Sparse Graph Processor

Song, et al. “Novel Graph Processor Architecture, Prototype System, and Results,” IEEE HPEC 2016

MatrixReader

MatrixMemory

Sorter ALU MatrixWriter

Control NetworkInterface

•  Express Graph as Adjacency Matrix •  Graph ops as Matrix-Vector or

Matrix-Matrix products

Page 29: Graph Analytics: Complexity, Scalability, and Architecturesweb.cse.ohio-state.edu/~lu.932/hpbdc2017/slides/hpbdc17-peter.pdf · Complexity, Scalability, and Architectures Peter M

29 GABB: May 23, 2016

•  Single System-wide Address Space •  Parallelism via Memory Channels •  Gossamer Cores (GCs) execute

Gossamer Threads at Nodelets –  Perform local computations & memory

references (inc. atomics)

–  Migrate to other Nodelets w’o software involvement

–  Spawn new Threads –  Call System Services on SCs

•  Stationary Cores (SCs): (Conventional cores)

–  Execute Operating System

–  Manage IO / File System

–  Call or Spawn Gossamer Threads

•  Programmed in Cilk

Migrating Threads for Streaming NodesInternal  Network

MemoryController

GCoreGCoreGCoreGCore

Memory

Nodelets

…MemoryController

GCoreGCoreGCoreGCore

Memory

Migration  Engine  &  Network   I/FMemoryController

GCoreGCoreGCoreGCore

Memory

Nodelets

…MemoryController

GCoreGCoreGCoreGCore

Memory

Migration  Engine  &  Network  I/F

If the mountain won’t come to you, you must go to the mountain - ancient proverb

Page 30: Graph Analytics: Complexity, Scalability, and Architecturesweb.cse.ohio-state.edu/~lu.932/hpbdc2017/slides/hpbdc17-peter.pdf · Complexity, Scalability, and Architectures Peter M

30 GABB: May 23, 2016

Emu 1

Emu Chick •  8 Nodes; 64 nodelets •  Copy room environment

Emu1 Memory Server •  256 nodes; 2048 nodelets •  Server room environment •  FCS 2017

The node boards are the same!

Tool Chain: Cilk C Today C++ in progress

T. Dysart, et al. “Highly Scalable Near Memory Processing with Migrating Threads on the Emu System Architecture,” SC16

Page 31: Graph Analytics: Complexity, Scalability, and Architecturesweb.cse.ohio-state.edu/~lu.932/hpbdc2017/slides/hpbdc17-peter.pdf · Complexity, Scalability, and Architectures Peter M

31 GABB: May 23, 2016

Where Useful

PropertyUpdates

Persistent Big Data/Graph Data Set

SeedIdentification

Selection

Criteria

SubgraphExtraction

Seeds

Sub GraphSub GraphSub GraphBatch

AnalyticsBatch

AnalyticsBatch

Analytics

LocalUpdate

Real-Time,Stream

Events

EventsGraph

Properties

2-level memory

Sparse Graph Engines Migrating Threads with

rich atomics

Page 32: Graph Analytics: Complexity, Scalability, and Architecturesweb.cse.ohio-state.edu/~lu.932/hpbdc2017/slides/hpbdc17-peter.pdf · Complexity, Scalability, and Architectures Peter M

32 GABB: May 23, 2016

0.1

1

10

100

1000

0 1 2 3 4 5 6 7 8 9 10

Spee

dup  over  201

2  Ba

seline

RacksHeavyWeight Lightweight Next-­‐Gen  Compute Emu

Projection for LexisNexis Problem - Still “Batch Mode” Computations

Emu1

Emu2

Emu3

Baseline

Upgrades

Emu1 assumes 400MHz GCs 2400 MT/s DRAM Channels

ARM servers

KNL-Like

Page 33: Graph Analytics: Complexity, Scalability, and Architecturesweb.cse.ohio-state.edu/~lu.932/hpbdc2017/slides/hpbdc17-peter.pdf · Complexity, Scalability, and Architectures Peter M

33 GABB: May 23, 2016

Conclusions •  Most graph benchmarks today

–  Batch oriented –  Assume simple graphs –  Focus on total graph properties

•  Real-world apps today radically different –  Many vertex/edge classes with many properties –  Real interest: localized “neighborhoods”

•  Key queries not computable in real-time today •  “Two-level” graph processing flow meshes both

streaming and batch computations •  Architectures are emerging that will support such

graph processing directly

Page 34: Graph Analytics: Complexity, Scalability, and Architecturesweb.cse.ohio-state.edu/~lu.932/hpbdc2017/slides/hpbdc17-peter.pdf · Complexity, Scalability, and Architectures Peter M

34 GABB: May 23, 2016

Acknowledgements: Funding

•  In part by NSF CCF-1642280 •  In part by the Univ. of Notre Dame