thu bernstein key_warp_speed

74

Upload: eswcsummerschool

Post on 25-Jan-2015

109 views

Category:

Documents


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Thu bernstein key_warp_speed
Page 2: Thu bernstein key_warp_speed

Processing Linked Data at Warp SpeedAbraham Bernstein

CCBY NASA http://www.flickr.com/photos/nasa_jsc_photo/sets/72157629726792248/with/7197236116/0

Page 3: Thu bernstein key_warp_speed

CCBY NASA http://www.flickr.com/photos/nasa_jsc_photo/sets/72157629726792248/with/7197236116/0

Page 4: Thu bernstein key_warp_speed

"Earth's Location in the Universe (JPEG)" by Andrew Z. Colvin - Own work. Licensed under Creative Commons Attribution-Share Alike 3.0 via Wikimedia Commons http://commons.wikimedia.org/wiki/File:Earth%27s_Location_in_the_Universe_(JPEG).jpg#mediaviewer/File:Earth%27s_Location_in_the_Universe_(JPEG).jpg

Page 5: Thu bernstein key_warp_speed
Page 6: Thu bernstein key_warp_speed
Page 7: Thu bernstein key_warp_speed

"IBM Electronic Data Processing Machine - GPN-2000-001881" NASA, Public Domain @ Wikimedia Commons -

http://commons.wikimedia.org/wiki/File:IBM_Electronic_Data_Processing_Machine_-_GPN-2000-001881.jpg

Page 8: Thu bernstein key_warp_speed

Processing Graphs

"IBM Electronic Data Processing Machine - GPN-2000-001881" NASA, Public Domain @ Wikimedia Commons -

http://commons.wikimedia.org/wiki/File:IBM_Electronic_Data_Processing_Machine_-_GPN-2000-001881.jpg

Page 9: Thu bernstein key_warp_speed

Semantic Web ReasoningKB: Asserted Triples

Entailed KB: Asserted & Infered Triples

DL

Reasoning

Inductive R.

Analogical R

.

Your R.

Page 10: Thu bernstein key_warp_speed
Page 11: Thu bernstein key_warp_speed

Signal/Collect

P. Stutz, A. Bernstein, and W. Cohen. Signal/Collect: Graph algorithms for the (semantic) web. International Semantic Web Conference–ISWC 2010. Springer Berlin Heidelberg, 2010. 764-780.

Page 12: Thu bernstein key_warp_speed

• Vertices as stateful processing units

• Vertices interact through signals along edges

• Which are collected by a processing function that updates the vertex state

Processing Graphs Naturally

Page 13: Thu bernstein key_warp_speed

• Define the graph structure

• Vertices represent RDFS classes

• Edges from superclasses to subclasses

• Vertex state initialized with the class that the vertex represents

Signal/Collect: An Intuition for RDFS subclass inference

id: animalstate: {animal}

id: birdstate: {bird}

id: owlstate: {owl}

id: penguinstate: {penguin}

id: animal

id: bird

id: owlid: penguin

Stutz et al, 2010

Page 14: Thu bernstein key_warp_speed

Signal/Collect: An Intuition for RDFS subclass inference

Stutz et al, 2010

id: animalstate: {animal}

id: birdstate: {bird}

id: owlstate: {owl}

id: penguinstate: {penguin}

{bird, animal}

{animal}

{bird, animal}

id: animalstate: {animal}

id: birdstate: {bird}

id: owlstate: {owl}

id: penguinstate: {penguin}id: penguinstate: {penguin, bird, animal}

id: birdstate: {bird, animal}

id: owlstate: {owl, bird, animal}

def collect =

state [[

s2signals

s

def signal = state

Page 15: Thu bernstein key_warp_speed

Scoring/Asychronicity: Single-Source Shortest Path

state: ∞

state: ∞

state: ∞

state: ∞

state: ∞

state: ∞

state: 0

1

∞∞1

1∞

state: 1 state: ∞

state: ∞

state: ∞state: 1

state: 1 2

1

2

∞21

1∞

state: 1 state: 2

state: 2

state: 2state: 1

state: 1 2

1

2

321

13

state: 1 state: 2

state: 2

state: 2state: 1

state: 1

def signal = state + weightdef collect = min (state, min (signals) )

Page 16: Thu bernstein key_warp_speed

state: 1

Scoring/Asynchronicity: Single-Source Shortest Path

state: ∞

state: ∞

state: ∞

state: 0

111

state: 1 state: ∞

state: ∞

state: ∞

state: 1 22

2

state: 2

state: 2

state: 2

state: 2

state: 2

var oldState = infinity

def scoreSignal =if (state ! = oldState)

1

else

0

33

Page 17: Thu bernstein key_warp_speed

PageRank in Code

class Document(id: Any) extends Vertex(id, 0.15) { def collect = 0.15 + 0.85 * signals[Double].foldLeft(0.0)(_ + _)}

Algorithm

class Citation(citer: Any, cited: Any) extends Edge(citer, cited) { def signal = source.state.asInstanceOf[Double] * weight / source.sumOfOutWeights}

ExecutionInitialization object Algorithm {

def executeCitationRank(db: SparqlAccessor) { val computeGraph = new AsynchronousComputeGraph() val citations = new SparqlTuples(db, "select ?source ?target where {" + "?source <http://lsdis.cs.uga.edu/projects/semdis/opus#cites> ?target}") citations foreach { case (citer, cited) => computeGraph.addVertex[Document](citer) computeGraph.addVertex[Document](cited) computeGraph.addEdge[Citation](citer, cited) } computeGraph.execute() }}

Page 18: Thu bernstein key_warp_speed

Signal/Collect as a Platform

P. Stutz, A. Bernstein, and W. Cohen. Signal/Collect: Graph algorithms for the (semantic) web. International Semantic Web Conference–ISWC 2010. Springer Berlin Heidelberg, 2010. 764-780.

T R I P L E R U S H F R A U D D E T E C T I O N

D C O P S

Imag

es::

Bo In

sogn

a, h

ttps:

//flic

.kr/p

/9fa

mxT

, CC

-NC

-ND

http

s://c

reat

ivec

omm

ons.

org/

licen

ses/

by-n

c-nd

/2.0

/ An

tana

, http

s://fl

ic.k

r/p/g

GQ

PhA,

CC

- BY-

SA,

http

s://c

reat

ivec

omm

ons.

org/

licen

ses/

by-s

a/2.

0/,

Jeff

Kubi

na, h

ttps:

//flic

.kr/p

/2C

G5P

U, C

C-B

Y-SA

Page 19: Thu bernstein key_warp_speed

Tr i p l e S t o r e

E LV I S D Y L A N

J O B S

?X I N S P I R E D ?Y ?Y I N S P I R E D ?Z

I N S P I R E D

DATA QUERY

INSP I R

ED

Page 20: Thu bernstein key_warp_speed

DylanElvis inspired

*Elvis inspiredDylan* inspired DylanElvis *

*Elvis *** inspired Dylan* *

** *

Page 21: Thu bernstein key_warp_speed

DylanElvis inspired

*Elvis inspiredDylan* inspired DylanElvis *

*Elvis *** inspired Dylan* *

** *

*Dylan inspired Jobs* inspiredJobsDylan *

JobsDylan inspired

*Dylan *Jobs* *

Page 22: Thu bernstein key_warp_speed

DylanElvis inspired

Dylan* inspired

** inspired

*Dylan inspired Jobs* inspired

JobsDylan inspired

Query Vertex

?X inspired ?Y!?Y inspired ?Z

?X inspired ?Y!?Y inspired ?Z

Elvis inspired Dylan!Dylan inspired ?Z

Dylan inspired Jobs!Jobs inspired ?Z

No vertex with ID![ Jobs inspired * ]

Query Vertex

{ ?X = Elvis, ?Y = Dylan, ?Z = Jobs }

Elvis inspired Dylan!Dylan inspired Jobs

?X inspired ?Y!?Y inspired ?Z

Page 23: Thu bernstein key_warp_speed

DylanElvis inspired

Dylan* inspired

** inspired

*Dylan inspired Jobs* inspired

JobsDylan inspired

Query Vertex

?X inspired ?Y!?Y inspired ?Z

?X inspired ?Y!?Y inspired ?Z

Elvis inspired Dylan!Dylan inspired ?Z

Dylan inspired Jobs!Jobs inspired ?Z

No vertex with ID![ Jobs inspired * ]

Query Vertex

{ ?X = Elvis, ?Y = Dylan, ?Z = Jobs }

Elvis inspired Dylan!Dylan inspired Jobs

?X inspired ?Y!?Y inspired ?Z

Page 24: Thu bernstein key_warp_speed

P e r f o r m a n c e R e s u l t s

Distributed (8 nodes), LUBM 10240 (~1.36 billion triples)

Single-node, LUBM 160 (~21 million triples)

Fastest L1 L2 L3 L4 L5 L6 L7 Geo.

of 10 runs mean

TripleRush 3,111.2 1,457.9 0.7 3.5 9.5 29.1 1,165.8 62.1

Trinity.RDF 12,648.0 6,018.0 8,735.0 5.0 4.0 9.0 31,214.0 450.0

TriAD 7,631.0 1,663.0 4,290.0 2.1 0.5 69.0 14,895.0 249.0

TriAD-SG 2,146.0 2,025.0 1,647.0 1.3 0.7 1.4 16,863.0 106.0

Fastest L1 L2 L3 L4 L5 L6 L7 Geo

of 10 runs mean

TripleRush 22.6 27.8 0.4 1.0 0.4 0.9 21.2 2.94

Trinity.RDF 281.0 132.0 110.0 5.0 4.0 9.0 630.0 46.0

TriAD 427.0 117.0 210.0 2.0 0.5 19.0 693.0 39.0

TriAD-SG 97.0 140.0 31.0 1.0 0.2 1.8 711.0 14.0

Page 25: Thu bernstein key_warp_speed

O n g o i n g W o r k : G r a p h P a r t i t i o n i n g

• https://github.com/uzh/triplerush

Page 26: Thu bernstein key_warp_speed

T R I P L E R U S H F R A U D D E T E C T I O N

D C O P S

Signal/Collect as a Platform

Imag

es::

Bo In

sogn

a, h

ttps:

//flic

.kr/p

/9fa

mxT

, CC

-NC

-ND

http

s://c

reat

ivec

omm

ons.

org/

licen

ses/

by-n

c-nd

/2.0

/ An

tana

, http

s://fl

ic.k

r/p/g

GQ

PhA,

CC

- BY-

SA,

http

s://c

reat

ivec

omm

ons.

org/

licen

ses/

by-s

a/2.

0/,

Jeff

Kubi

na, h

ttps:

//flic

.kr/p

/2C

G5P

U, C

C-B

Y-SA

Page 27: Thu bernstein key_warp_speed

D e c o m p o s i n g F r a u d P a t t e r n s

• Participants can be labeled as:

• Splitters

• Aggregators

• Forwarders

$10k$11k

2 x $5k $10k

10k CHF

2k CHF4k CHF

6k CHF8k CHF

Page 28: Thu bernstein key_warp_speed

E l i c i t a t i o n P r o c e s s

FilterConnectMatch

Page 29: Thu bernstein key_warp_speed

B i t c o i n Tr a n s a c t i o n s : R u n t i m e v s . M a t c h i n g C o m p l e x i t y

0"

1000"

2000"

3000"

4000"

5000"

6000"

7000"

4" 6" 8" 10" 12"

Time%(sec)%

Matching%Complexity%

Time"in"GC"

Total"Processing"Time"

Dataset Size: 50M Transactions, Matching Duration: 1 week

Runtime: 27 min Throughput: 1.8M / min

Runtime: 35 min Throughput: 1.4M / min

Runtime: 98 min Throughput: 0.5M / min

Page 30: Thu bernstein key_warp_speed

T R I P L E R U S H F R A U D D E T E C T I O N

D C O P S

Signal/Collect as a Platform

Imag

es::

Bo In

sogn

a, h

ttps:

//flic

.kr/p

/9fa

mxT

, CC

-NC

-ND

http

s://c

reat

ivec

omm

ons.

org/

licen

ses/

by-n

c-nd

/2.0

/ An

tana

, http

s://fl

ic.k

r/p/g

GQ

PhA,

CC

- BY-

SA,

http

s://c

reat

ivec

omm

ons.

org/

licen

ses/

by-s

a/2.

0/,

Jeff

Kubi

na, h

ttps:

//flic

.kr/p

/2C

G5P

U, C

C-B

Y-SA

Page 31: Thu bernstein key_warp_speed

≠≠

y

Distributed Constraint Optimization

z

q

x

x ≠ y y ≠ z z ≠ x q ≠ z

x ∈ {0,1,2} y ∈ {0,1} z ∈ {1,2}

q ∈ {0,1,2}

Page 32: Thu bernstein key_warp_speed

Vertex Coloring in action

Optimized Version of DSA Running on a MacBook Pro with 8 workers(slow, due to lots of IO for logging, bookkeeping, etc.)

• Scaled to: 10 Million vertices / variablesVerman and Bernstein, 2014

Page 33: Thu bernstein key_warp_speed

Industry Usage!

“We  are  using  Signal/Collect  to  analyze  millions  of  claims  every  day  to  iden9fy  opportuni9es  for  our  clients  to  save  money  through  be=er  healthcare  or  avoiding  fraud,  waste,  and  abuse.”  

US  Healthcare  Analy/cs  Company  

Page 34: Thu bernstein key_warp_speed

"IBM Electronic Data Processing Machine - GPN-2000-001881" NASA, Public Domain @ Wikimedia Commons -

http://commons.wikimedia.org/wiki/File:IBM_Electronic_Data_Processing_Machine_-_GPN-2000-001881.jpg

Page 35: Thu bernstein key_warp_speed

Processing Graph Streams

"IBM Electronic Data Processing Machine - GPN-2000-001881" NASA, Public Domain @ Wikimedia Commons -

http://commons.wikimedia.org/wiki/File:IBM_Electronic_Data_Processing_Machine_-_GPN-2000-001881.jpg

Page 36: Thu bernstein key_warp_speed

147.1b € 97.5b €

Page 37: Thu bernstein key_warp_speed

147.1b € 97.5b €

5’100

Page 38: Thu bernstein key_warp_speed
Page 39: Thu bernstein key_warp_speed

3m Viewers 300 Events/s

Page 40: Thu bernstein key_warp_speed

250 TV Channels 25 frames/s

Page 41: Thu bernstein key_warp_speed

EPG for 7 days LOD enhanced

Page 42: Thu bernstein key_warp_speed
Page 43: Thu bernstein key_warp_speed

Traditional TripleStore

http://www.mpi.de/

http://www.ifi.uzh.ch/

http://www.uni-sb.de/courses/

http://www.universities.de/Saarbruecken http://www.ifi.uzh.ch/

http://www.ifi.uzh.ch/i_am_a_URI

http://www.ifi.uzh.ch/i_am_a_URI

http://www.ifi.uzh.ch/i_am_a_URI

http://www.ifi.uzh.ch/i_am_a_URI

http://www.ifi.uzh.ch/i_am_a_URI

http://www.ifi.uzh.ch/i_am_a_URI

http://www.lubm.org/teaches

http://www.lubm.org/

Page 44: Thu bernstein key_warp_speed

Traditional TripleStore

http://www.mpi.de/

http://www.ifi.uzh.ch/

http://www.uni-sb.de/courses/

http://www.universities.de/Saarbruecken http://www.ifi.uzh.ch/

http://www.ifi.uzh.ch/i_am_a_URI

http://www.ifi.uzh.ch/i_am_a_URI

http://www.ifi.uzh.ch/i_am_a_URI

http://www.ifi.uzh.ch/i_am_a_URI

http://www.ifi.uzh.ch/i_am_a_URI

http://www.ifi.uzh.ch/i_am_a_URI

http://www.lubm.org/teaches

http://www.lubm.org/

Page 45: Thu bernstein key_warp_speed

Semantic Flow Processing

Page 46: Thu bernstein key_warp_speed

Semantic Flow Processing

t

Page 47: Thu bernstein key_warp_speed

ViSTA-TV Viewership Data

0

7500

15000

22500

30000

0h 8h 16h 24h 8h 16h 24h 8h 16h 24h

Num of Valid Data Entries

day 1 day 2 day 3

Cac

he

ViSTA-­‐TV  project:  UserLog  and  EPG  streams

Page 48: Thu bernstein key_warp_speed

Semantic Flow Processing is:

• Time-stamped tripes t = <s,p,o> [time]

• Semantic flow F = [t1, t2, ... tn]

• Perform query matching on cached subset of F

• Subject to Stress—incoming data-rate overwhelms the system’s processing capability

Page 49: Thu bernstein key_warp_speed

Context: Time Window

last 1 min

?within 1 sec

Page 50: Thu bernstein key_warp_speed

Load Shedding

last 1 min

Page 51: Thu bernstein key_warp_speed

Eviction

last 1 min

?

Page 52: Thu bernstein key_warp_speed

Eviction is:

• Remove cached data

• Note: Eviction may lower recall

• Evict potential to produce future results

• Types of eviction strategies (considered):

• Random

• Time-based (i.e, FIFO)

• Least Recently Used (LRU)

Page 53: Thu bernstein key_warp_speed

Why not LRU?

LRU

C

B

Input: UserLog

,<watch>,Channel User 1 B

Join

EPG CacheTwo-way Join:

UserLog EPG

Page 54: Thu bernstein key_warp_speed

Why not LRU?

LRU

C

B

User 1

Input: UserLog

,<watch>,Channel C

Join

EPG Cache

Page 55: Thu bernstein key_warp_speed

Why not LRU?

LRU

B

C

Input

Input: EPG

Channel , <Play>, Show 4DD

EPG Cache

D

User 3

Input: UserLog

,<watch>,Channel

?

Page 56: Thu bernstein key_warp_speed

Why not LRU?

LRU

C

Input: EPG

Channel , <Play>, Show 5E

B

Input: EPG

Channel , <Play>, Show 4D

InputInput: EPG

Channel , <Play>, Show 6FInput: EPG

Channel , <Play>, Show 7G

Input: EPG

Channel , <Play>, Show 8H

D

EPG Cache

Page 57: Thu bernstein key_warp_speed

CLOCK is:

• Consider both recency (LRU) and past results

• Giving each data entry a score

• The score could be incremented and depreciated

• Named by the buffer management algorithm CLOCK

Page 58: Thu bernstein key_warp_speed

CLOCK

CLOCK Score

6

1

3C

B

Input: UserLog

,<watch>,Channel User 2 C

Join

EPG Cache

Page 59: Thu bernstein key_warp_speed

CLOCK

CLOCK Score

6

1

4C

B

Input: UserLog

,<watch>,Channel User 2 C

Join

EPG Cache

Page 60: Thu bernstein key_warp_speed

CLOCK

CLOCK Score

6

1

4C

B

Input: EPG

Channel , <Play>, Show 4D

Inputdep()

EPG Cache

Page 61: Thu bernstein key_warp_speed

CLOCK

CLOCK Score

51

4C

B

Input: EPG

Channel , <Play>, Show 4D

Inputdep()

EPG Cache

Page 62: Thu bernstein key_warp_speed

CLOCK

CLOCK Score

5

04C

B

Input: EPG

Channel , <Play>, Show 4D

Input dep()D

EPG Cache

Page 63: Thu bernstein key_warp_speed

CLOCK

CLOCK Score

5

init = 1

4C

Input: EPG

Channel , <Play>, Show 4D

Input dep()D

EPG Cache

Page 64: Thu bernstein key_warp_speed

CLOCK

CLOCK Score

5

1

4C

Input

dep()D

Input: EPG

Channel , <Play>, Show 5E EPG Cache

Page 65: Thu bernstein key_warp_speed

CLOCK

CLOCK Score

5

1

3C

Input

dep()D

Input: EPG

Channel , <Play>, Show 5E EPG Cache

Page 66: Thu bernstein key_warp_speed

CLOCK

CLOCK Score

4

1

3C

Inputdep()

D

Input: EPG

Channel , <Play>, Show 5E EPG Cache

Page 67: Thu bernstein key_warp_speed

CLOCK

CLOCK Score

4

0

3C

Input dep()D

Input: EPG

Channel , <Play>, Show 5E EPG CacheE

Page 68: Thu bernstein key_warp_speed

CLOCK

CLOCK Score

4

1

3C

Input

dep()E

EPG Cache

Input: EPG

Channel , <Play>, Show 6F

Input: EPG

Channel , <Play>, Show 7G

Input: EPG

Channel , <Play>, Show 8H

Page 69: Thu bernstein key_warp_speed

CLOCK Eviction is:

!

• The weight is adjusted by dep():

• Linear: dep(w) = w - 1

• Exponential: dep(w) = w * ρ (0< ρ < 1)

Recency History

Page 70: Thu bernstein key_warp_speed

Experimental Results TV Viewership Data

Rec

all

0%

25%

50%

75%

100%

Cache Size

100% 50% 25% 15% 1%

Random FIFO LRU CLOCK

Two-way join query with 1,919,216 input triples

Page 71: Thu bernstein key_warp_speed

Experimental Results: Depreciation Function

Exponential factor ρ = ρ in {0.25, 0.5, 0.75, 0.95}

Page 72: Thu bernstein key_warp_speed

Limitations

• Static depreciation function: dep()

• Experiment on other datasets

• Real implementation to investigate other performance metrics

• Only local eviction strategies considered

Page 73: Thu bernstein key_warp_speed

CCBY NASA http://www.flickr.com/photos/nasa_jsc_photo/sets/72157629726792248/with/7197236116/0

"IBM Electronic Data Processing Machine - GPN-2000-001881" NASA, Public Domain @ Wikimedia Commons -

http://commons.wikimedia.org/wiki/File:IBM_Electronic_Data_Processing_Machine_-_GPN-2000-001881.jpg

Semantic Web ReasoningKB: Asserted Triples

Entailed KB: !Asserted & Infered Triples

DL

Reasoning

Inductive R.

Analogical R

.

Your R.

Signal/Collect

P. Stutz, A. Bernstein, and W. Cohen. Signal/Collect: Graph algorithms for the (semantic) web. International Semantic Web Conference–ISWC 2010. Springer Berlin Heidelberg, 2010. 764-780.

CLOCK Eviction is:

!

• The weight is adjusted by dep():"

• Linear: dep(w) = w - 1"

• Exponential: dep(w) = w * ρ (0< ρ < 1)

Recency History

Page 74: Thu bernstein key_warp_speed

CCBY NASA http://www.flickr.com/photos/nasa_jsc_photo/sets/72157629726792248/with/7197236116/0

Philip Stutz, Mihaela Verman, Shen Gao, Daniel Strebel, Bibek Paudel, Lorenz Fischer, Thomas Keller, Robin Hafen, Genc Mazlami