intel research petros maniatis, irb declarative overlays petros maniatis joint work with tyson...

41
Intel Intel Research Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis Petros Maniatis joint work with Tyson Condie, David joint work with Tyson Condie, David Gay, Gay, Joseph M. Hellerstein, Boon Thau Loo, Joseph M. Hellerstein, Boon Thau Loo, Raghu Ramakrishnan, Sean Rhea, Raghu Ramakrishnan, Sean Rhea, Timothy Roscoe, Atul Singh, Ion Stoica Timothy Roscoe, Atul Singh, Ion Stoica IRB, UC Berkeley, U Wisconsin, Rice IRB, UC Berkeley, U Wisconsin, Rice

Upload: anna-dempsey

Post on 27-Mar-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

IntelIntel Research Research Petros Maniatis, IRB

Declarative OverlaysDeclarative Overlays

Petros ManiatisPetros Maniatisjoint work with Tyson Condie, David Gay,joint work with Tyson Condie, David Gay,Joseph M. Hellerstein, Boon Thau Loo,Joseph M. Hellerstein, Boon Thau Loo,Raghu Ramakrishnan, Sean Rhea,Raghu Ramakrishnan, Sean Rhea,Timothy Roscoe, Atul Singh, Ion StoicaTimothy Roscoe, Atul Singh, Ion StoicaIRB, UC Berkeley, U Wisconsin, RiceIRB, UC Berkeley, U Wisconsin, Rice

Page 2: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

2

Petros Maniatis, IRBIntelIntel Research Research

Overlays Everywhere…Overlays Everywhere…““Overlay”: the routing and message forwarding Overlay”: the routing and message forwarding

component of any self-organizing distributed systemcomponent of any self-organizing distributed system

Internet

Overlay

Page 3: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

3

Petros Maniatis, IRBIntelIntel Research Research

Overlays Everywhere…Overlays Everywhere… Many examples:Many examples:

Internet Routing, multicastInternet Routing, multicast

Content delivery, file sharing, DHTs, GoogleContent delivery, file sharing, DHTs, Google

Microsoft Exchange (for load balancing)Microsoft Exchange (for load balancing)

Tibco (technology bridging)Tibco (technology bridging)

Overlays are a fundamental tool for repurposing Overlays are a fundamental tool for repurposing communication infrastructurescommunication infrastructures

Get a bunch of friends together and build your Get a bunch of friends together and build your own ISP (Internet evolvability)own ISP (Internet evolvability)

You don’t like Internet Routing? Make up your You don’t like Internet Routing? Make up your own rules (RON)own rules (RON)

Paranoid? Run a VPN in the wide areaParanoid? Run a VPN in the wide area

Intrusion detection with friends (FTN, Polygraph)Intrusion detection with friends (FTN, Polygraph)

Have your assets discover each other (iAMT)Have your assets discover each other (iAMT)

Internet

Overlay

Distributed systems innovation Distributed systems innovation needsneeds overlays overlays

Page 4: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

4

Petros Maniatis, IRBIntelIntel Research Research

If only it weren’t so hardIf only it weren’t so hard In theoryIn theory

Figure out right propertiesFigure out right properties

Get the algorithms and protocols Get the algorithms and protocols

Implement themImplement them

Tune themTune them

Test themTest them

Debug themDebug them

RepeatRepeat

But in practiceBut in practice

No global viewNo global view

Wrong choice of algorithmsWrong choice of algorithms

Incorrect implementationIncorrect implementation

Psychotic timeoutsPsychotic timeouts

Partial failuresPartial failures

Impaired introspectionImpaired introspection

Homicidal boredomHomicidal boredom

It’s hard enough as it isIt’s hard enough as it is

Do I also need to reinvent the wheel every time?Do I also need to reinvent the wheel every time?

Page 5: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

5

Petros Maniatis, IRBIntelIntel Research Research

Our GoalOur Goal Make overlay development more accessible to Make overlay development more accessible to

developers of distributed applicationsdevelopers of distributed applications

Specify overlay at a high-levelSpecify overlay at a high-level

Automatically translate specification into executableAutomatically translate specification into executable

Hide everything they don’t want to touchHide everything they don’t want to touch

Enjoy performance that is Enjoy performance that is good enoughgood enough

Do for networked systems what the relational Do for networked systems what the relational revolution did for databasesrevolution did for databases

Page 6: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

6

Petros Maniatis, IRBIntelIntel Research Research

Enter P2: SemanticsEnter P2: Semantics Distributed stateDistributed state

Distributed soft state in relational tables, holding tuples of valuesDistributed soft state in relational tables, holding tuples of values

route (S, D, H)route (S, D, H)

Non-stored information passes around as Non-stored information passes around as event tuple streamsevent tuple streams

message (X, D)message (X, D)

Overlay specification in declarative logic language (OverLog)Overlay specification in declarative logic language (OverLog)

<head> :- <precondition1>, <precondition2>, … , <preconditionN>.<head> :- <precondition1>, <precondition2>, … , <preconditionN>.

Location specifiers Location specifiers @Loc@Loc placeplace individual tuples at specific nodes individual tuples at specific nodes

message@H(H, D) :- route@S(S, D, H), message@S(S, D).message@H(H, D) :- route@S(S, D, H), message@S(S, D).

(a, y, c)

(a, z, r)

(a, z, t)

message@a(a, z)

message@r(r, z)

message@t(t, z)

t

r

Page 7: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

7

Petros Maniatis, IRBIntelIntel Research Research

Enter P2: DataflowEnter P2: Dataflow Specification automatically translated to a dataflow Specification automatically translated to a dataflow

graphgraph C++ dataflow elements (akin to Click elements)C++ dataflow elements (akin to Click elements)

ImplementImplement relational operators (joins, selections, projections)relational operators (joins, selections, projections)

flow operators (multiplexers, demultiplexers, queues)flow operators (multiplexers, demultiplexers, queues)

network operators (congestion control, retry, rate network operators (congestion control, retry, rate limitation)limitation)

Interlinked via asynchronous push or pull typed flowsInterlinked via asynchronous push or pull typed flows Pull carries a callback from the puller in case it failsPull carries a callback from the puller in case it fails

Push always succeeds, but halts subsequent pushesPush always succeeds, but halts subsequent pushes

Execution engine runs the dataflow graphExecution engine runs the dataflow graph Simple FIFO event scheduler (a la libasync) for I/O, Simple FIFO event scheduler (a la libasync) for I/O,

alarms, deferred execution, etc.alarms, deferred execution, etc.

demux

A distributed query processor to maintain overlaysA distributed query processor to maintain overlays

Page 8: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

8

Petros Maniatis, IRBIntelIntel Research Research

Example: Ring Routing Example: Ring Routing Every node has an Every node has an addressaddress (e.g., (e.g.,

IP address) and an IP address) and an identifier identifier (large random)(large random)

Every object has an Every object has an identifieridentifier

Order nodes and objects into a Order nodes and objects into a ring by their identifiersring by their identifiers

Objects “served” by their Objects “served” by their successor nodesuccessor node

Every node knows its Every node knows its successorsuccessor on the ringon the ring

To find object To find object KK, walk around the , walk around the ring until I locate K’s immediate ring until I locate K’s immediate successor nodesuccessor node

3

28

15

1840

60

58 13

37

0

56

42

222433

Page 9: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

9

Petros Maniatis, IRBIntelIntel Research Research

Example: Ring Routing Example: Ring Routing How do I find the

responsible node for a given key k?

n.lookup(k)

if k in (n, n.successor)

return n.successor

else

return n.successor. lookup(k)

3

28

15

1840

60

58 13

37

Page 10: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

10

Petros Maniatis, IRBIntelIntel Research Research

Ring StateRing State n.lookup(k)

if k in (n, n.successor)

return n.successor

else

return n.successor. lookup(k)

Node state tuples

node(NAddr, N)

successor(NAddr, Succ, SAddr)

Transient event tuples

lookup (NAddr, Req, K)

3

28

15

1840

60

58 13

37

Page 11: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

11

Petros Maniatis, IRBIntelIntel Research Research

Pseudocode to OverLogPseudocode to OverLog n.lookup(k)

if k in (n, n.successor]

return n.successor

else

return n.successor. lookup(k)

Node state tuples

node(NAddr, N)

successor(NAddr, Succ, SAddr)

Transient event tuples

lookup (NAddr, Req, K)

R1 response (Req, K, SAddr) :-

lookup (NAddr, Req, K),

node (NAddr, N),

succ (NAddr, Succ, SAddr),

K in (N, Succ].

Page 12: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

12

Petros Maniatis, IRBIntelIntel Research Research

Pseudocode to OverLogPseudocode to OverLog n.lookup(k)

if k in (n, n.successor]

return n.successor

else

return n.successor. lookup(k)

Node state tuples

node(NAddr, N)

successor(NAddr, Succ, SAddr)

Transient event tuples

lookup (NAddr, Req, K)

R1 response (Req, K, SAddr) :-

lookup (NAddr, Req, K),

node (NAddr, N),

succ (NAddr, Succ, SAddr),

K in (N, Succ].

R2 lookup (SAddr, Req, K) :-

lookup (NAddr, Req, K),

node (NAddr, N),

succ (NAddr, Succ, SAddr),

K not in (N, Succ].

Page 13: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

13

Petros Maniatis, IRBIntelIntel Research Research

Location SpecifiersLocation Specifiers n.lookup(k)

if k in (n, n.successor]

return n.successor

else

return n.successor. lookup(k)

Node state tuples

node(NAddr, N)

successor(NAddr, Succ, SAddr)

Transient event tuples

lookup (NAddr, Req, K)

R1 response@Req(Req, K, SAddr) :-

lookup@NAddr(NAddr, Req, K),

node@NAddr(NAddr, N),

succ@NAddr(NAddr, Succ, SAddr),

K in (N, Succ].

R2 lookup@SAddr(SAddr, Req, K) :-

lookup@NAddr(NAddr, Req, K),

node@NAddr(NAddr, N),

succ@NAddr(NAddr, Succ, SAddr),

K not in (N, Succ].

Page 14: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

14

Petros Maniatis, IRBIntelIntel Research Research

From OverLog to DataflowFrom OverLog to DataflowR1 response@R(R, K, SI) : - lookup@NI(NI, R, K),

node@NI(NI, N), succ@NI(NI, S, SI), K in (N, S].

R2 lookup@SI(SI, R, K) :- lookup@NI(NI, R, K),node@NI(NI, N), succ@NI(NI, S, SI), K not in (N, S].

lookup

Page 15: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

15

Petros Maniatis, IRBIntelIntel Research Research

From OverLog to DataflowFrom OverLog to DataflowR1 response@R(R, K, SI) : - lookup@NI(NI, R, K),

node@NI(NI, N), succ@NI(NI, S, SI), K in (N, S].

R2 lookup@SI(SI, R, K) :- lookup@NI(NI, R, K),node@NI(NI, N), succ@NI(NI, S, SI), K not in (N, S].

node

Joinlookup.NI ==

node.NIlookup NI, R, K, N

Page 16: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

16

Petros Maniatis, IRBIntelIntel Research Research

From OverLog to DataflowFrom OverLog to DataflowR1 response@R(R, K, SI) : - lookup@NI(NI, R, K),

node@NI(NI, N), succ@NI(NI, S, SI), K in (N, S].

R2 lookup@SI(SI, R, K) :- lookup@NI(NI, R, K),node@NI(NI, N), succ@NI(NI, S, SI), K not in (N, S].

node succ

Joinlookup.NI ==

node.NI

Joinlookup.NI ==

succ.NIlookup NI, R, K, N, S, SI

Page 17: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

17

Petros Maniatis, IRBIntelIntel Research Research

From OverLog to DataflowFrom OverLog to DataflowR1 response@R(R, K, SI) : - lookup@NI(NI, R, K),

node@NI(NI, N), succ@NI(NI, S, SI), K in (N, S].

R2 lookup@SI(SI, R, K) :- lookup@NI(NI, R, K),node@NI(NI, N), succ@NI(NI, S, SI), K not in (N, S].

node succ

Joinlookup.NI ==

node.NI

Joinlookup.NI ==

succ.NI

SelectK in (N, S]lookup

NI, R, K, N, S, SIK in (N, S]

Page 18: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

18

Petros Maniatis, IRBIntelIntel Research Research

From OverLog to DataflowFrom OverLog to DataflowR1 response@R(R, K, SI) : - lookup@NI(NI, R, K),

node@NI(NI, N), succ@NI(NI, S, SI), K in (N, S].

R2 lookup@SI(SI, R, K) :- lookup@NI(NI, R, K),node@NI(NI, N), succ@NI(NI, S, SI), K not in (N, S].

node succ

Joinlookup.NI ==

node.NI

Joinlookup.NI ==

succ.NI

SelectK in (N, S]

Projectresponse@R

(R, K, SI)lookup response

Page 19: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

19

Petros Maniatis, IRBIntelIntel Research Research

From OverLog to DataflowFrom OverLog to DataflowR1 response@R(R, K, SI) : - lookup@NI(NI, R, K),

node@NI(NI, N), succ@NI(NI, S, SI), K in (N, S].

R2 lookup@SI(SI, R, K) :- lookup@NI(NI, R, K),node@NI(NI, N), succ@NI(NI, S, SI), K not in (N, S].

node succ

Joinlookup.NI ==

node.NI

Joinlookup.NI ==

succ.NI

SelectK in (N, S]

Projectresponse@R

(R, K, SI)

Joinlookup.NI ==

node.NI

Joinlookup.NI ==

succ.NI

lookup

lookup

response

NI, R, K, N, S, SI

Page 20: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

20

Petros Maniatis, IRBIntelIntel Research Research

From OverLog to DataflowFrom OverLog to DataflowR1 response@R(R, K, SI) : - lookup@NI(NI, R, K),

node@NI(NI, N), succ@NI(NI, S, SI), K in (N, S].

R2 lookup@SI(SI, R, K) :- lookup@NI(NI, R, K),node@NI(NI, N), succ@NI(NI, S, SI), K not in (N, S].

node succ

Joinlookup.NI ==

node.NI

Joinlookup.NI ==

succ.NI

SelectK in (N, S]

Projectresponse@R

(R, K, SI)

Joinlookup.NI ==

node.NI

Joinlookup.NI ==

succ.NI

SelectK not in (N, S]

lookup

lookup

response

NI, R, K, N, S, SIK not in (N, S]

Page 21: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

21

Petros Maniatis, IRBIntelIntel Research Research

From OverLog to DataflowFrom OverLog to DataflowR1 response@R(R, K, SI) : - lookup@NI(NI, R, K),

node@NI(NI, N), succ@NI(NI, S, SI), K in (N, S].

R2 lookup@SI(SI, R, K) :- lookup@NI(NI, R, K),node@NI(NI, N), succ@NI(NI, S, SI), K not in (N, S].

node succ

Joinlookup.NI ==

node.NI

Joinlookup.NI ==

succ.NI

SelectK in (N, S]

Projectresponse@R

(R, K, SI)

Joinlookup.NI ==

node.NI

Joinlookup.NI ==

succ.NI

SelectK not in (N, S]

Projectlookup@SI(SI, R, K)

lookup

lookup

response

lookup

Page 22: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

22

Petros Maniatis, IRBIntelIntel Research Research

From OverLog to DataflowFrom OverLog to Dataflow One rule strand per OverLog rule

Rule order is immaterial

Rule strands could execute in parallel

node succ

Rule R1

Rule R2

lookup

lookup

response

lookup

Page 23: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

23

Petros Maniatis, IRBIntelIntel Research Research

Transport and App LogicTransport and App Logic

node succ

Rule R1lookup

lookup Rule R2

...

...

Dem

uxQ

ueue

UD

P

Tx

UD

P

Rx

CC

R

x

Round

Robin

Queue

...

...

CC

T

x

Page 24: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

24

Petros Maniatis, IRBIntelIntel Research Research

A Bit of ChordA Bit of ChordL1 Join

lookup.NI == node.NI

Joinlookup.NI ==bestSucc.NI

TimedPullPush0

SelectK in (N, S]

ProjectlookupRes

MaterializationsInsert

Insert

Insert

L3 TimedPullPush0

JoinbestLookupDist.NI

== node.NI

L2 TimedPullPush0

JoinbestLookupDist.NI

== node.NI

node

Demux(@local?)

TimedPullPush

0

Network OutQueueremote

local

Netw

ork In

best

Look

upD

ist

finge

rbe

stSu

cc

bestSucc

look

up

Mux

TimedPullPush

0Q

ueue

Dup

finger

node

RoundR

obin

Dem

ux(tuple nam

e)

Agg min<D>on finger

D:=K-B-1, B in (N,K)

Agg min<BI>on finger

D==K-B-1, B in (N,K)

Page 25: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

25

Petros Maniatis, IRBIntelIntel Research Research

Chord on P2Chord on P2 Full specification of ToN ChordFull specification of ToN Chord

Multiple successorsMultiple successors

StabilizationStabilization

Failure recoveryFailure recovery

Optimized finger maintenanceOptimized finger maintenance

46 OverLog rules46 OverLog rules

(1 USletter page, 10pt font) (1 USletter page, 10pt font)

How do we know it works?How do we know it works? Same high-level propertiesSame high-level properties

Logarithmic overlay diameterLogarithmic overlay diameter

Logarithmic state sizeLogarithmic state size

Consistent routing with churnConsistent routing with churn

““Comparable” performance to hand-Comparable” performance to hand-coded implementationscoded implementations

Page 26: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

26

Petros Maniatis, IRBIntelIntel Research Research

Lookup length in hops(no churn)

Lookup length in hops(no churn)

Page 27: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

27

Petros Maniatis, IRBIntelIntel Research Research

Maintenance bandwidth(no churn)

Maintenance bandwidth(no churn)

Page 28: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

28

Petros Maniatis, IRBIntelIntel Research Research

Lookup Latency(no churn)

Lookup Latency(no churn)

Page 29: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

29

Petros Maniatis, IRBIntelIntel Research Research

Lookup Latency(with churn)

Lookup Latency(with churn)

Page 30: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

30

Petros Maniatis, IRBIntelIntel Research Research

Lookup Consistency(with churn)

Lookup Consistency(with churn)

Consistent fraction: size fraction of largest result clusterConsistent fraction: size fraction of largest result cluster

k lookups, different sources, same destinationk lookups, different sources, same destination

Page 31: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

31

Petros Maniatis, IRBIntelIntel Research Research

Maintenance bandwidth(churn)

Maintenance bandwidth(churn)

Page 32: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

32

Petros Maniatis, IRBIntelIntel Research Research

But Still a Research PrototypeBut Still a Research Prototype Bugs still creep up (algorithmic logic / P2 impl.)Bugs still creep up (algorithmic logic / P2 impl.)

Multi-resolution system introspectionMulti-resolution system introspection

Application-specific network tuning, auto or Application-specific network tuning, auto or otherwise still neededotherwise still needed

Component-based reconfigurable transportsComponent-based reconfigurable transports

Logical duplications ripe for removalLogical duplications ripe for removal

Factorizations and Cost-based optimizationsFactorizations and Cost-based optimizations

Page 33: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

33

Petros Maniatis, IRBIntelIntel Research Research

1. System Introspection1. System Introspection Two unique opportunitiesTwo unique opportunities

Transparent execution tracingTransparent execution tracing

A distributed query processor on all system stateA distributed query processor on all system state

Page 34: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

34

Petros Maniatis, IRBIntelIntel Research Research

Execution Tracing and LoggingExecution Tracing and Logging Execution tracing/logging happens externally to system specificationExecution tracing/logging happens externally to system specification

At “pseudo-code” granularity: At “pseudo-code” granularity: logical steppinglogical stepping Why did rule R7 trigger? Under what preconditions?Why did rule R7 trigger? Under what preconditions?

Every rule execution (input and outputs) is exported as a tableEvery rule execution (input and outputs) is exported as a table– ruleExec(Rule, InTuple, OutTuple, OutNode, Time)

At dataflow granularity: At dataflow granularity: intermediate representation steppingintermediate representation stepping Why did that tuple expire? What dropped from that queue?Why did that tuple expire? What dropped from that queue?

Every dataflow element execution exported as a table, flows tapped and exportedEvery dataflow element execution exported as a table, flows tapped and exported– queueExec(…), roundRobinExec(…), …

Transparent logging by the execution engineTransparent logging by the execution engine No need to insert printf’s and hope for the bestNo need to insert printf’s and hope for the best

Can Can traversetraverse execution graph for particular system events execution graph for particular system events Its preconditions, and their preconditions, and so on across the netIts preconditions, and their preconditions, and so on across the net

Page 35: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

35

Petros Maniatis, IRBIntelIntel Research Research

Distributed Query ProcessingDistributed Query Processing Once you have a distributed query processor, lots of things fall off the Once you have a distributed query processor, lots of things fall off the

back of the truckback of the truck

Overlay invariant monitoring: Overlay invariant monitoring: a distributed watchpointa distributed watchpoint

““What’s the average path length?”What’s the average path length?”

““Is routing consistent?”Is routing consistent?”

Pattern matching on distributed execution graphPattern matching on distributed execution graph

““Is a routing entry gossiped in a cycle?”Is a routing entry gossiped in a cycle?”

““How many lookup failures were caused by stale routing state?”How many lookup failures were caused by stale routing state?”

““What are the nodes with best-successor in-degree > 1?”What are the nodes with best-successor in-degree > 1?”

““Which bits of state only occur when a lookup fails somewhere?”Which bits of state only occur when a lookup fails somewhere?”

Monitoring disparate overlays / systems togetherMonitoring disparate overlays / systems together

““When overlay A does this, what is overlay B doing?”When overlay A does this, what is overlay B doing?”

““When overlay A does this, what is the network, average CPU, … doing?”When overlay A does this, what is the network, average CPU, … doing?”

Page 36: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

36

Petros Maniatis, IRBIntelIntel Research Research

2. Reconfigurable Transport2. Reconfigurable Transport New lease on life of an old idea!New lease on life of an old idea!

Dataflow paradigm thins out layer Dataflow paradigm thins out layer boundariesboundaries

Mix and match transport facilities Mix and match transport facilities (retries, congestion control, rate (retries, congestion control, rate limitation, buffering)limitation, buffering)

Spread bits of transport through the Spread bits of transport through the application to suit application application to suit application requirementsrequirements

Move buffering before computationMove buffering before computation

Move retries before route selectionMove retries before route selection

Use single congestion control across Use single congestion control across all destinationsall destinations

Express transport spec at high-levelExpress transport spec at high-level ““Packetize all msgs to same dest Packetize all msgs to same dest

together, but send acks separately”together, but send acks separately”

““Packetize updates but not acks”Packetize updates but not acks”

Queue CC Tx

Demux

RR Sched

CC Rx UDP Rx

UDP TxRoute/ Demux

Ap

plic

atio

n

Ne

two

rk

(a)

Retry

Queue CC Tx

Demux

RR Sched

CC Rx UDP Rx

UDP TxRoute/ Demux

Ap

plic

atio

n

Ne

two

rk

(b)

Retry

... ...

CC Tx

Demux

RR Sched

CC Rx UDP Rx

UDP Tx

Route/ Demux

Ap

plic

atio

n

Ne

two

rk

(c)

Retry

... ...

Buffered Agg

......

...

Page 37: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

37

Petros Maniatis, IRBIntelIntel Research Research

3. Automatic Optimization3. Automatic Optimization Optimize within rulesOptimize within rules

Selects before joins, join orderingSelects before joins, join ordering

Optimize across rules & queriesOptimize across rules & queries

Common “subexpression” Common “subexpression” eliminationelimination

Optimize across nodesOptimize across nodes

Send the smallest relation over Send the smallest relation over the networkthe network

Caching of intermediate resultsCaching of intermediate results

Optimize schedulingOptimize scheduling

Prolific rules before deadbeatsProlific rules before deadbeats

node succ

Joinlookup.NI ==

node.NI

Joinlookup.NI ==

succ.NI

SelectK in (N, S]

Projectresponse@R

(R, K, SI)YES

Projectlookup@SI(SI, R, K)

NO

lookup

node succ

Joinlookup.NI ==

node.NI

Joinlookup.NI ==

succ.NI

SelectK in (N, S]

Projectresponse@R

(R, K, SI)

Joinlookup.NI ==

node.NI

Joinlookup.NI ==

succ.NI

SelectK not in (N, S]

Projectlookup@SI(SI, R, K)

lookup

lookup

Page 38: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

38

Petros Maniatis, IRBIntelIntel Research Research

What We Don’t Know (Yet)What We Don’t Know (Yet) The limits of first-order logicThe limits of first-order logic

Already pushing through to second-order, to do introspectionAlready pushing through to second-order, to do introspection

Can be awkward to translate inherently imperative constructs, etc. if-then-else / loopsCan be awkward to translate inherently imperative constructs, etc. if-then-else / loops

The limits of the dataflow modelThe limits of the dataflow model Control vs. data flowControl vs. data flow

Can we eliminate (most) queues? If not, what’s the point?Can we eliminate (most) queues? If not, what’s the point?

Can we do concurrency control for parallel execution?Can we do concurrency control for parallel execution?

The limits of “automation”The limits of “automation” Can we (ever) do better than hand-coded implementations? Does it matter?Can we (ever) do better than hand-coded implementations? Does it matter?

How good is good enough?How good is good enough?

Will designers settle for auto-generation? DBers did, but this is a different communityWill designers settle for auto-generation? DBers did, but this is a different community

The limits of static checkingThe limits of static checking Can we keep the semantics simple enough for existing checks (termination, safety, …) Can we keep the semantics simple enough for existing checks (termination, safety, …)

to still work automatically?to still work automatically?

Page 39: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

39

Petros Maniatis, IRBIntelIntel Research Research

Related WorkRelated Work Early work on executable protocol specificationEarly work on executable protocol specification

Esterel, Estelle, LOTOS (finite state machine specs)Esterel, Estelle, LOTOS (finite state machine specs)

Morpheus, Prolac (domain-specific, OO)Morpheus, Prolac (domain-specific, OO)

RTAG (grammar model)RTAG (grammar model)

ClickClick

Dataflow approach for routing stacksDataflow approach for routing stacks

Larger elements, more straightforward schedulingLarger elements, more straightforward scheduling

Deductive / active databasesDeductive / active databases

Page 40: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

40

Petros Maniatis, IRBIntelIntel Research Research

SummarySummary Overlays enable distributed system innovationOverlays enable distributed system innovation

We’d better make them easier to build, reuse, understandWe’d better make them easier to build, reuse, understand

P2 enablesP2 enables High-level overlay specification in OverLogHigh-level overlay specification in OverLog

Automatic translation of specification into dataflow graphAutomatic translation of specification into dataflow graph

Execution of dataflow graphExecution of dataflow graph

Explore and Embrace the trade-off between fine-tuning and Explore and Embrace the trade-off between fine-tuning and ease of developmentease of development

Get the full immersion treatment in our papers at Get the full immersion treatment in our papers at SIGCOMM and SOSP ‘05SIGCOMM and SOSP ‘05

Page 41: Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,

P2

41

Petros Maniatis, IRBIntelIntel Research Research

Questions(a few to get you started)

Questions(a few to get you started)

Who cares about overlays?Who cares about overlays?

Logic? You mean Prolog? Eeew!Logic? You mean Prolog? Eeew!

This language is really ugly. Discuss.This language is really ugly. Discuss.

But what about security?But what about security?

Is anyone ever going to use this?Is anyone ever going to use this?

Is this as revolutionary and inspired as it looks?Is this as revolutionary and inspired as it looks?