1 compositional design and analysis of timing-based distributed algorithms nancy lynch theory of...

53
1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston, Virginia December 10, 2002

Upload: loraine-marshall

Post on 18-Jan-2018

217 views

Category:

Documents


0 download

DESCRIPTION

3 Project Overview Design and analyze distributed algorithms that implement global services with strong guarantees, e.g.: –Reliable communication –Strongly coherent data services Dynamic environment, where processes join, leave, and fail. Algorithms composed of sub-algorithms. Analyze performance conditionally, under various assumptions about timing and failures. Develop underlying mathematical modeling framework, based on interacting state machines (IOA, TIOA), capable of: –Describing precisely all the algorithms we study. –Supporting compositional and conditional analysis.

TRANSCRIPT

Page 1: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

1

Compositional Design and Analysis of Timing-Based Distributed Algorithms

Nancy LynchTheory of Distributed SystemsMIT

Third MURI WorkshopArlington-Ballston, VirginiaDecember 10, 2002

Page 2: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

2

MIT Participants

• Leader– Nancy Lynch

• Postdoctoral associates– Idit Keidar, Dilsun Kirli

• Graduate students– Roger Khazan, Carl Livadas, Ziv Bar-Joseph, Rui Fan,

Seth Gilbert, Sayan Mitra• Collaborators

– Alex Shvartsman and students, Frits Vaandrager, Roberto Segala

Page 3: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

3

Project Overview• Design and analyze distributed algorithms that implement

global services with strong guarantees, e.g.: – Reliable communication– Strongly coherent data services

• Dynamic environment, where processes join, leave, and fail.• Algorithms composed of sub-algorithms. • Analyze performance conditionally, under various

assumptions about timing and failures. • Develop underlying mathematical modeling framework,

based on interacting state machines (IOA, TIOA), capable of:– Describing precisely all the algorithms we study.– Supporting compositional and conditional analysis.

Page 4: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

4

Algorithms Studied

• Scalable group communication [Khazan, Keidar]• Early-delivery dynamic atomic broadcast

[Bar-Joseph, Keidar, Lynch]• Reconfigurable atomic memory [Lynch, Shvartsman]• Scalable reliable multicast [Livadas, Keidar, Lynch]• In progress:

– Reconfigurable atomic memory– Peer-to-peer: Fault-tolerant location services, data services– Mobile: Topology control, clock synchronization, tracking

Page 5: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

5

This Talk

I. Completed work: Scalable group communication Early-delivery dynamic atomic broadcast

II. Reconfigurable atomic memoryIII. Reliable multicastIV. Modeling frameworkV. Plans for the next two years

Page 6: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

6

I. Completed Work Scalable Group Communication

[Keidar, Khazan 00, 02], [Khazan 02], [Keidar, Khazan, Lynch, Shvartsman 02]

[Taraschanskiy 00]

GCS

Page 7: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

7

Group Communication Services

• Cope with changing participants using abstract groups of client processes with changing membership sets.

• Processes communicate with group members indirectly, by sending messages to the group as a whole.

• GC services support management of groups:– Maintain membership information.

• Form new views in response to changes.– Manage communication.

• Communication respects views.• Provide guarantees about ordering and reliability of message delivery.• Virtual synchrony

• Applications: Managing replicated data; distributed multiplayer games; collaborative work

Page 8: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

8

Scalable GC Algorithm • Specification, including virtual synchrony.• New algorithm:

– Uses a scalable membership service, implemented on a small set of membership servers.

– Multicast implemented on all the nodes.

– View change uses only one round for state exchange, in parallel with membership service’s agreement on views.

– Participants can join during view formation.

GCS

Net

Memb

GCS

Page 9: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

9

Analysis

• Safety proofs, using incremental proof methods.• Liveness proofs.• Performance analysis:

– Time from when network stabilizes until GCS announces final view.– Message latency.– Conditional analysis, based on input, failure, and timing assumptions.– Compositional analysis, based on performance of membership service

and Net.• Modeled and analyzed data-management application

running on top of the new GCS.• Distributed implementation [Taraschanskiy 00].

S S’A A’

Page 10: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

10

Completed Work:

Early-Delivery Dynamic Atomic Broadcast

[Bar-Joseph, Keidar, Lynch 02]

DAB

Page 11: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

11

Dynamic Atomic Broadcast

• Atomic broadcast, where processes may join, leave, or fail.• Safety: Sending, receiving orders are consistent with a single

global message ordering (no gaps).• Liveness: Eventual completion of joins, leaves. Eventual

delivery, including the process’ own messages.• Fast delivery, even with joins, leaves.• Application: Distributed multiplayer interactive games.

joinleavemcast(m)

join-ackleave-ackrcv(m)

DAB

Page 12: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

12

Implementing DAB

• Processes:– Timing-dependent, have approximately-synchronized clocks.

• Net:– Pairwise FIFO delivery– Low latency– But does not guarantee a single total order, nor that all processes see

the same messages from a failing process.

join

net-join

DAB

Net

Page 13: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

13

Dynamic Atomic Bcast Algorithm

• Processes coordinate message delivery:– Divide time into slots using local clock, assign messages to slots.– Deliver messages in order of (slot, sender id).– Determine members of each slot, deliver only messages from members.

• Processes must agree on slot membership:– Joining process selects join-slot, informs others. – Similarly for leaving process.– Failed process results in consensus on failure slot.

• Requires a new kind of consensus service: Consensus with Uncertain Participants (CUP).– Participants not known a priori.– Each participant submits its perceived “World”.– Processes may abstain.

Page 14: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

14

DABi1 DABi2

CUP(j)

DAB

Net

fail fail

The DAB Algorithm Using CUP

Page 15: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

15

Consensus with Uncertain Participants

• CUP Problem:– Guarantees agreement, validity, termination.– Assumes submitted worlds are “close”:

• Process that initiates is in other processes’ worlds• Process in anyone’s world initiates, abstains, leaves, or fails.

• CUP Algorithm– A new early-stopping consensus algorithm.– Similar to [Dolev, Reischuk, Strong 90], but:

• Tolerates uncertainty about participants.• Tolerates processes leaving.

– Terminates in two rounds when failures stop, even if leaves continue.– Latency linear in number of actual failures

Page 16: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

16

Analysis

• Compositional analysis: Properties of CUP used to prove properties of DAB:– Safety: CUP agreement and validity imply DAB atomic broadcast

consistency guarantees.– Liveness: CUP safety and liveness properties (e.g., termination) imply

DAB liveness properties (e.g., eventual delivery).– Latency: CUP decision bounds imply DAB message delay bounds.

• Message latency:– No failures: Constant, even when participants join and leave.– With failures: Linear in the number of failures.– Improves upon algorithms using group communication.

Page 17: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

17

II. Reconfigurable Atomic Memory for Dynamic Distributed Environments

[Lynch, Shvartsman 02]

RAMBO

Page 18: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

18

Reconfigurable Atomic Memory• Implement atomic read/write shared memory in a

dynamic network setting.– Participants may join, leave, fail.– Mobile networks, peer-to-peer networks.

• High availability, low latency.• Atomicity for all patterns of asynchrony and change.• Good performance under reasonable limits on

asynchrony and change.• Applications:

– Battle data for teams of soldiers in military operation.– Game data for players in multiplayer game.

Page 19: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

19

Approach: Dynamic Quorums

• Objects are replicated at several network locations.• To accommodate small, transient changes:

– Uses quorum configurations: members, read-quorums, write-quorums.

– Maintains atomicity during stable situations.– Allows concurrency.

• To handle larger, more permanent changes:– Reconfigure– Maintains atomicity across configuration changes.– Any configuration can be installed at any time.– Reconfigure concurrently with reads/writes; no

heavyweight view change.

Page 20: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

20

RAMBO

• Reconfigurable Atomic Memory for Basic Objects (dynamic atomic read/write shared memory).

• Global service specification:

• Algorithm: – Reads and writes objects.– Chooses new configurations, notifies members.– Identifies, garbage-collects obsolete configurations.– All concurrently.

RAMBO

Page 21: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

21

• Main algorithm + reconfiguration service• Loosely coupled• Recon service:

– Provides a consistent sequence of configurations.

• Main algorithm: – Handles reading, writing.– Receives, disseminates new configuration information; no

formal installation.– Garbage-collects old configurations.– Reads/writes may use several configurations.

Recon

Net

Recon

RRAMBO

RAMBO Algorithm Structure

Page 22: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

22

Main algorithm: Reading and Writing

• Run a version of the standard static two-phase quorum-based read/write algorithm [Vitanyi, Awerbuch], [Attiya, Bar-Noy, Dolev].

• Use all current configurations.

read,write

Net

Recon

new-config

Page 23: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

23

Static Read/write Protocol

• Quorum configuration:– read-quorums, write-quorums– For any RR in read-quorums, WW in write-quorums, RR W W ..

• Replicate object at all locations.• At each location, keep:

– value– tag = (sequence number, location)

• Read, write use two phases:– Phase 1: Read (value, tag) from a read-quorum– Phase 2: Write (value,tag) to a write-quorum

• Highly concurrent.• Quorum intersection implies atomicity

Page 24: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

24

Static Read/write Protocol Details• Write at location i:

– Phase 1: • Read (value, tag) from a read-quorum.• Determine largest seq-number among the tags that are read. • Choose new-tag := (larger sequence-number, i).

– Phase 2: • Propagate (new-value, new-tag) to a write-quorum.

• Read at location i:– Phase 1:

• Read (value, tag) from a read-quorum.• Determine largest (value,tag) among those read.

– Phase 2:• Propagate this (value,tag) to a write-quorum.• Return value.

Page 25: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

25

Dynamic Read/write Protocol• Perform two-phase static protocol, using all current

configurations.– Phase 1: Collect object values from read-quorums of current

configurations.– Phase 2: Propagate latest value to write-quorums of current

configurations.• When new configuration is provided by Recon:

– Start using it too.– Do not abort reads/writes in progress, but do extra work to access

additional processes needed for new quorums.• Our communication mechanism:

– Background gossiping– Terminate by fixed-point condition, involving a quorum from each

active configuration.

Page 26: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

26

Removing Old Configurations

• Garbage-collect them in the background.• Two-phase garbage-collection procedure:

– Phase 1: • Inform write-quorum of old configuration about the

new configuration. • Collect object values from read-quorum of the old

configuration.– Phase 2:

• Propagate the latest value to a write-quorum of the new configuration.

• Garbage-collection concurrent with reads/writes.• Implemented using gossiping and fixed points.

Page 27: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

27

Implementation of Recon• Uses distributed consensus to determine successive

configurations 1, 2, 3,…

• Members of old configuration propose new configuration.• Proposals reconciled using consensus.• Consensus is a heavyweight mechanism, but:

– Only used for reconfigurations, infrequent.– Does not delay read/write operations.

Consensus

Recon

Net

Page 28: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

28

Consensus Implementation

• Use a variant of timing-based Paxos algorithm [Lamport]• Agreement, validity guaranteed absolutely (independent of

timing).• Termination guaranteed when underlying system stabilizes.• Leader chosen using failure detectors; conducts two-phase

algorithm with retries.

decide(v)init(v)

init(v)

Consensus

Page 29: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

29

Analysis

• We prove atomicity for arbitrary patterns of asynchrony and change, using partial order methods.

• Analyze performance conditionally, based on failure and timing assumptions.

• E.g., under reasonable “steady-state” assumptions:– Removing old configurations takes time at most 6d.– Reads and writes take time at most 8d.

• LAN implementation [Musial 02].

Page 30: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

30

Other Approaches• Use consensus to agree on total order of operations:

[Lamport 89]– Not resilient to transient failures. – Termination of reads/writes depends on termination of

consensus.• Totally-ordered broadcast over group communication:

[Amir, Dolev, Melliar-Smith, Moser 94], [Keidar, Dolev 96]– View formation takes a long time, delays reads/writes.– One change may trigger view formation.

Page 31: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

31

III. Reliable Multicast Protocols[Livadas, Keidar, Lynch 01],

[Livadas, Keidar 02], [Livadas, Lynch 02]

Page 32: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

32

Physical System Model

• Infinite # of symmetric hostsi.e., same resources, processes

• Network of interconnected routers• Failures: fail-stop host crashes and packet drops

r1r2

r6 r4r5

r3h1

h2

h3

h4

h5

h6

Page 33: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

33

Reliable Multicast Service (RMS)

Overview:– Single reliable multicast group & single client process/host– RM() encompasses behavior of all other processes on hosts and

functionality of underlying network– Parameter bounds the reliable delivery delay

Membership:– A host becomes a member of the group upon the acknowledgment of

its join request– A host ceases to be a member upon issuing a leave request

RM-Client1 RM-Client2

RM()

rm-join1 rm-join-ack1

RM-Client1

rm-send1(p)

RM-Client1

rm-recv2(p)

RM-Client1RM-Client1

rm-leave1

RM-Client1

Page 34: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

34

Multicast Reliability: Properties

Let h,s be hosts and p,p’ be packets from s such that p<p’

Eventual Delivery: If p’ remains active forever after its transmission, h delivers p, and h remains a member thereafter, then h delivers p’.

Time-Bounded Delivery: Let T denote the time interval ranging from the transmission time t of p’ to the point in time time units past t.If p’ remains active throughout T, h delivers p prior to the expiration of T, and h remains a member thereafter within T, then h delivers p’ within T.

Page 35: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

35

Reliable Multicast Implementation (RMI)

• Scalable Reliable Multicast (SRM) [Floyd et. al., 97]– Retransmission-based protocol using NACKs– Uses best-effort IP multicast as communication primitive

• Augment SRM so as to precisely specify:– when a host becomes a member of the group– which packets each member should attempt to recover

Page 36: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

36

SRM’s Recovery Scheme

• Each host schedules a request for each missing packet• Any capable host schedules a reply to each such request• Duplicate requests/replies limited using deterministic and probabilistic

suppression schemes

hh’

s

rqstrepl

Page 37: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

37

IP-mcast

RM-Client1

RM-mem1RM-rep1

RM-Client2

RM-mem2RM-rep2

RM-IPbuff1 RM-rec1 RM-IPbuff2 RM-rec2

RMI Timed I/O Automaton Model

Page 38: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

38

Analysis of RMI

Correctness Analysis:RMI implements RMS; i.e., RMI delivers appropriate packets to appropriate members of the reliable multicast group as dictated by RMS.

Conditional Timeliness Analysis:Presuming no leaves, no crashes, bounded transmission latencies and latency estimates, bounded loss detection delays, and a fixed number k of packet drops per packet transmission/recovery, packets are guaranteed delivery within particular delivery delay upper bound (k).

Page 39: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

39

Byproduct of RMI Timeliness Analysis

• Constraints on SRM scheduling parameters– C3 < C1 : back-off abstinence does not affect next round

requests– D1 + D2 + 2 < 2 C1: replies received prior to transmission

of next round requests– D1 + D2 + D3 < 2 C1: requests not discarded due to prior

round reply abstinence• Violating these guidelines may lead to superfluous

traffic and unwarranted recovery round failure

Page 40: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

40

Caching-Enhanced SRM (CESRM)

• Enhance SRM with caching scheme– determines and caches optimal requestor/replier pair for each loss– expedites recovery of losses based on requestor/replier pair cache

hh’

s

exp-rqst

exp-repl

Page 41: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

41

CESRM Timed I/O Automaton Model

IP-mcast

RM-Client1

RM-mem1RM-rep1

RM-Client2

RM-mem2RM-rep2

RM-IPbuff1 RM-rec1 RM-IPbuff2 RM-rec2RM-IPbuff1 RM-rec1 RM-IPbuff2 RM-rec2

IP-ucast

Page 42: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

42

CESRM: Conditional Timeliness Analysis

Definition: A cache hit corresponds to a recovery scenario in which:– hosts that share the loss also share optimal requestor-replier pair,– the optimal requestor shares the loss, and – the optimal replier does not share the loss.

Claim:For any execution where no recovery packets are dropped, cache hits lead to packet recovery within at most:DET-BOUND+ dreorder-delay+2d+

as opposed to:DET-BOUND+(C1+C2)d++d++(D1+D2)d++d+

For C1=C2=D1=D2=1, worst-case recovery delay following detection reduced from ~3 RTT to ~1

RTT

Page 43: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

43

Estimating the Frequency of Cache Hits

• Analyzed 14 multicast transmission traces [Yajnik et al. 95/96]• On average, ~1/3 of losses recoverable by expedited recoveries• More precise identification of loss locations may lead to the recovery of

~1/2 of losses by expedited recoveries

Abstract loss location representationActual loss location representation

Page 44: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

44

IV. Modeling Framework

• To support all this analysis, we need a well-designed mathematical foundation, capable of:– Describing all the algorithms we want to consider.– Supporting compositional and conditional analysis.

• We use a framework based on interacting state machines.– Basic asynchronous model (I/O automata)– Augmented models: Timed, hybrid (continuous/discrete),

probabilistic.

Page 45: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

45

I/O Automata [Lynch, Tuttle 87]

• Nondeterministic, infinite-state automata– States, start states– Actions: Input, output, internal – Transitions (s,a,s’)– Executions, traces– A implements B if traces(A) traces(B)

• Describing system modularity:– Parallel composition – Levels of abstraction

• Reasoning methods:– Invariant assertions– Simulation relations– Compositional methods

• Used to describe asynchronous distributed algorithms.

Page 46: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

46

Timed I/O Automata (TIOA) [Merritt, Modugno,Tuttle], [Lynch, Vaandrager]

• Add time-passage actions • Used to describe:

– Timeout-based algorithms.– Local clocks, clock synchronization.– Timing/performance characteristics.

Page 47: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

47

Hybrid I/O Automata (HIOA) [Lynch, Segala, Vaandrager 01, 02]

• Automata with continuous and discrete transitions– States: Input, output, internal variables; start states– Actions: Input, output, internal– Discrete transitions (s,a,s’)– Trajectories , mapping time intervals to states– Execution 0 a1 1 a2 2 …– Trace: Project on external variables, external actions.– A implements B if traces(A) traces(B).

• Composition, levels of abstraction.• Invariants, simulation relations, compositional reasoning• Used to describe:

– Controlled systems– Automated transportation systems– Embedded systems

Page 48: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

48

Timed I/O Automata (TIOA), Revisited[Lynch, Segala, Vaandrager, Kirli]

• Have reformulated TIOA as a special case of HIOA: – No external variables: states consist of internal variables only.

• Use trajectories to describe time-passage, instead of time-passage actions.

• Monograph on modeling timed systems:– Theory– Analysis methods– Examples– Relationships with other timed models

[Alur, Dill], [Merritt, Modugno, Tuttle], [Maler, Manna, Pnueli]

Page 49: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

49

Probabilistic Automata (PIOA, PTIOA)[Segala 95] [Segala, Vaandrager, Lynch 02]

• Add probabilistic transitions (s,a,)• Work in progress [Segala, Vaandrager, Lynch],

[de Alfaro, Henzinger]:– External behavior notion.– Composition theorems. – Implementation relationships

• Used to describe:– Probabilistic and nondeterministic behavior.– Randomized distributed algorithms– Systems with probabilistic assumptions

Page 50: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

50

V. Plans for the Next Two Years

Page 51: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

51

Plans: Distributed Algorithms• Reconfigurable atomic memory

– LAN implementation [Musial, Shvartsman]– More analysis:

• “Normal behavior” starting from some point– Algorithmic improvements:

• Concurrent garbage-collection [Gilbert]• Reduced communication• Better join protocol• Faster reads

– Extensions:• “Leave” protocol• Backup strategies for when configurations fail• Support for choosing configurations

Page 52: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

52

Plans: Distributed Algorithms• Reliable multicast protocols [Livadas]:

– Extend SRM analysis to handle nodes leaving and failing.– Finish CESRM analysis.– Analyze LMS protocol [Papadopoulos, Varghese 98].

• Mobile systems: – Topology control [Hajiaghayi, Mirrokni]– Time synchronization – Tracking – Resource allocation– Data management

• Peer-to-peer systems [Lynch, Stoica]: – Location services that are provably fault-tolerant under reasonable

steady-state assumptions. – Data management over location services

Page 53: 1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,

53

Plans: Semantic Framework

• Timed models: – Composition theorems for timing properties.– Structured TIOAs to support conditional performance

analysis.– Relate TIOA to other models, e.g., reactive modules

[Alur, Henzinger].• Probabilistic models:

– Composition theorems [de Alfaro, Henzinger]• Integrate timed and probabilistic models into one

semantic framework.