presented by kunmun garabadu & roney philip r eal– t ime c ommunication - paulo verissimo

Presented byKunmun Garabadu & Roney Philip

Real–Time Communica

tion-Paulo Verissimo

Real time communication

To achieve real-time communication:Real time protocols

Real time networks - timely and reliable

Characteristics of real time communicationKnown and bounded msg delivery

Deterministic behavior in the presence of disturbing factors

Recognition of latency classes

Connectivity

Real time networks

LAN or MAN

LAN Small scale

Reliable to very reliable

Span a few 1000 ms

Round trip times 10-5 to 10-1 secs

Reliability StrategiesFaults lead to:

Lost messages

Delays

Corrupted contents

Solution:

Space redundancy - replicated hardware• Mandatory for critical systems like flight control

Time redundancy - message repetition

Reliability StrategiesSpace redundancy Cons• High cost of hardware• Complex

Time redundancy Cons • Communication reliability low for real-time

applications

Which methods and techniques to use?Ask 2 questions1. Can we reliably obtain real time behavior out of

simplex( non- replicated) networks?2. Which protocols and QoS to use?

Reliability StrategiesSolution to 1

Combination of simplex standard LANsSpace redundancy in physical layer

• To maintain connectivity

Protocol time redundancy• Protocols see only one LAN controller

Solution to 2For reliability of communication

• Error masking• Error detection and forward recovery• Error detection and backward recovery

Error masking

Assume bounded number of failures, say k, from a particular component Have more than k channelsHave more than k transmissions Mask k failures

a) space redundancy b) time redundancy

Error detection: Forward recovery

For periodic real time communicationRelationship between consecutive measurementsPossible to skip a lost msgWait for the next msg

V(t1)use

previous value

k = 1

1 2 3

Maximum period without refreshing

a) Forward recovery

refreshed V(t3)

Error detection: Backward recovery

Ack based protocol

Restarts when a msg is lost

Appropriate when msgs cannot be lost

Timeoutk = 1

b) Backward recovery

Making real-time LANs reliableLANs have to display real-time behavior

Obtained by:Establishing a model

• Traffic patterns

• Reliability and timeliness requirements

• Failure assumptions

Service and interface definition

Dressing the elementary LAN with hardware and software to comply with requirements

Abstract LAN ModelWe need LAN interfacing to be LAN independentStandardisation bodies achieved this through LLCBut no services in LLC aims at real-time, reliability etcSo we devise a complete model overcoming these problemsUsing some of the properties of LAN to implement protocols

Abstract LAN Properties

An1 – Broadcast

An2 – Error Detection

An3 – Network Order

An4 – Full Duplex

An5 – Tightness

An6 – Bounded Transmission Delay

An7 – Bounded Omission Degree

An8 – Bounded Inaccessibility

Real time communication requirements

LAN components display following failures:

Timing failuresOmission failuresNetwork partitions

Definition of reliable real time networkRT- “A reliable real-time network displays bounded and known message delivery delay, in the presence of disturbing factors such as overload or faults”

Real time communication requirements

Some networks recognize urgency

Urgency classesCritical or hard real-time

Best-effort or soft real-time

Background or non real-time

Solution to real-time communication requirements

Enforce bounded delay from request to transmission of a frame given the worst case conditions assumed (avoid timing failures)

Ensure that a message is delivered despite the occurrence of omissions (tolerate omission failures)

Maintain connectivity (control partitions)

Enforcing Bounded Transmission Delay

An6 not guaranteed

Factors to take into account:Traffic patterns

Latency classes

LAN sizing and parametrising

User-level load/flow control

Traffic patterns

Designer must model the traffic offered to the network

Aperiodic trafficNo guarantees about transmission delays

Cyclic traffic – defined by period

Sporadic traffic – bursty

Latency classes

Traffic separation in latency classes

Highest criticality traffic should be given lowest latency class

Should be given certain amount of channel bandwidth to fulfill latency requirements

Enforce a given transmission time bound for every sender

LAN sizing and parametrising

LAN sized and parametrised to comply with aimed bound or vice-versa

Aimed latency not achievable with offered load

Consequences• Latency goes up

• number of nodes and/or their offered load go down

• Sending node reduces its traffic demands

Iterative procedure

User level load/flow control

Flow based load control delays transmissionsRole of real-time load control

Regulate global offered loadThrottle individual traffic

Sporadic event class has bound forInterarrival rateBurst lengthBurst rate

Average interarrival time

Minimum interarrival time

Fig: Timing pattern of sporadic events

Burst length

Burst period


Rate based flow controlCalculate average interarrival rate

Manipulate the rate at which data is sent

Smoothens the bursty nature

Rate should not go smaller than average interarrival rate


Load control mechanismsRate control

• Suited for periodic and sporadic traffic• Matches senders and recipients capabilities• No discontinuities in traffic flow

Credit control• Allocates recipients some credits• When credit is over, recipient refuses to accept more

information• Improved scheme – look ahead credit request or

supply

Handling Omission FailuresCharacterstics of omissions in a LAN:

Omissions are rare.

They can occur in bursts.

Are usually the result of failure of a single component.

Omission Degree : It is the number of consecutive omissions produced by a component.

An7 : Bounded Omissions Degree. In a known interval Trd, omission errors may affect at most k transmissions. This feature serves as the foundation of basic error processing protocols with deterministic termination. This is important for real time operation.

Transmission-With-Replytries := 0; resp := empty;do tries < nrTries ^ resp != full -> resp := empty; Tx(data, id);

waitRepliesPutInBag(TwaitReply,resp);

tries :=tries + 1;od

Diffusion

tries := 0;

do tries < nrTries ->

Tx(data, id);

tries :=tries + 1;

od

Tx-with reply Optimal for average case where error rate is expected to

be low Only one try in absence of errors Identifier id allows to distinguish between duplicate

messages. It aims for a completely correct series It allows for complete order among competing LAN

transmissions.

Diffusion At least one instance of the message reaches every node

It repeats transmission k+ 1 times.

Both algorithms execute within a bounded time in absence of partitions

Comparision of Algorithms

Features Tx-with Reply Diffusion

Worst-case delivery delay

k.TwaitReply + Ttd

(k+1).Ttd

No fault delivery delay

equal equal

Processing overhead

highest

Scalability equal equal

Network load highest

Comparision of Algorithms

Features Tx-with Reply Diffusion

Total order possible not possible

Failure Detection yes no

Upper layer inform in reply frame

possible not applicable

Resilence to lack of coverage

high none

Processing overhead

highest

InaccessibilityRT: Maintain connectivity An8 : Bounded Inaccessibility. In a known interval Trd, the network

may be inaccessible at most i times with a total duration of at most Tina.

Network is partitioned into subsets of nodes that cannot communicate.

Causes of partition : bus medium failure, ring disruption, transmitter or receiver defects, token loss etc.

Controlling partition : Solution is in knowing how long a partition lasts. This should be sufficiently small so that the service can be carried on effectively

Inaccessibility : Period of time for which the partition lasts.

Inaccessibility ControlHow to implement inaccessibility control ? Instrument the LAN to recover from all conditions leading to

partition Have a bound for number and duration of inaccessibility

periods Accommodate inaccessibility in the protocols and timeliness

calculations. Determine the upper bound for recovery from partitioning The upper bound may be dependant on operating situation

specific to each LAN. If network is properly managed and parameterised

inaccessibility figures can be drastically reduced.

Inaccessibility in Timeliness ModelInaccessibility must be accounted in the following :

Calculations of real worst case execution times Dimensioning of timeouts

Synchronous real-time operation of LAN: Tina has to be added to the real worst-case execution time of

protocols The protocol may fail if it times out too early but inaccessibility

occurs. Including Tina in time-outs is a sufficient condition for running

synchronous operation Tina may be much greater than Ttd causing timeouts to be

undesirably long.

Better to take inaccessibility off from the time-outsMethods to remove inaccessibility :

Timer Freezing : Inaccessibility is detected All timers used in time-outs are suspended Timers are restarted when the network becomes accessible Inaccessibility Trapping : Each inaccessibility period inside two consecutive transmission signals from

the LAN are trapped This avoids more than one timeout per inaccessibility period.

Each inaccessibility occurrence counts as one omission. Extra omissions have to be added in the retry count of the low level

protocols.

LAN RedundancyEnforcement of bounded omission degree and bounded

inaccessibility can be obtained through redundancy in the physical and medium layers FDDI has a dual-reconfiguring ring capable of

surviving just one interruption. Token-bus and Ethernet have no standardised

redundancy. Extra measures have to be implemented to survive

multiple failures.

Dual Media Token Bus LANHigher-level protocols

Medium-Access Control VLSI

Selector State Machines

Physical layer Physical layer

Dual Media Token Bus LAN

Addressing

Efficient and timely to meet real-time requirements. Reception of frames not addressed to anyone in the node has

to be avoided

Frame addressing involves the following : Construction of the address at frame transmission Interpretation of the address of the passing or received frame Address formats correspond to (type;addressing mode) Type performs the first step in selection ; it points to a set of

possible filters Mode selects the appropriate filter.

AddressingClassification of several addressing modes :

Individual : It enables a sender to address a particular station by its physical address.

Broadcast : It enables a frame to be accepted in all nodes.

Logical : It is intended to address a given group of nodes identified by a n-bit gate address independent of their location and number.

Selective : It consists of a n-bit binary chain but each of the bits represents a node. The association between a station and a bit can be static or dynamic.

Processor Group Membership It provides a map of the nodes belonging to the group. It is independent of higher level groupings of processes. It maintains an Active Stations Table (AST)

AST provides the station ordering and a basic mask where

stations are marked “up” or “down”

ST1 ST2 ST3 ST4

up up up down

Processor Group MembershipCategories of events that PGM responds to : Insert/Delete, Join/Leave, FailurePGM functions : Maintenance of AST : Responds to insert/delete requests Provision of Short Addresses : Reference a node by its positionin the

AST Failure and Group Change Handling : Acts upon suspicion of failure

that may come from a network driver, group communication protocol etc

Information about group members : Can respond to a number of requests regarding group members.

Clockless PGM ProtocolDelta-4 System A GroupChangeEvent for join,leave or failure cases triggers the

protocol. In case of failure, a component detecting failure issues the check

request. The node requests the other members’ state. The node gets replies and constructs the new AST. It sends it out to

members. This is done using Tx-with-Reply to make sure all members install the new table.

The first message locks the table so that competitors are left out With omissions more than one competitor may lock subsets of the

nodes Each of them retries incrementing a lock_level counter until one of

them locks all nodes successfully and then proceeds

Clockless PGM Protocol

My state Installed

GetState(and lock)

Compute station table NewState

(unlock)

Group change event

a) StationTableOps: Insert, Delete, Down, Up

Clock-driven PGM ProtocolAAS System Two events trigger the protocol : Upon request like join or

passage of time Periodically membership management is done to ensure

changes are detected in bounded time Group communication is through diffusion. Only way to

detect failures is through such a protocol. All processors diffuse an “I’m alive” message so that each

and everyone will build the same view of processors alive.

Time-Triggered PGM ProtocolMARS System

Periodically all nodes broadcast their message Each message is sent twice to overcome omission Each processor listens to all transmissions making a vector of

dimension N, where N is the number of nodes. Vu,v is a boolean which is true when processor u saw a valid message from processor v

Vector V is then sent in the following period transmission.All processors receive N vectors

A matrix is built which is as follows: Each column u accounts for the messages Pu saw from all others Each row v accounts for the messages from Pv seen by all the others

Time-Triggered PGM Protocol This protocol detects failures with one cycle delay at most. Matrices may not be equal in all nodes.They guarantee to have enough

information to deterministically detect a failed processor. A failed processor is one that fails to transmit both copies of its

message to all or fails to receive both copies of another node’s message

[ ]P1 * V2,1 V3,1 V4,1P2 V1,2 * V3,2 V4,2P3 V1,3 V2,3 * V4,3P4 V1,4 V2,4 V3,4 * [ ]

SummaryReal time communication

Real time networksReal time protocols

Real time networking and reliability policiesMaking real-time networks reliable and timely

Bounded transmission delaysHandling failuresInaccessibility

Summary

Low level protocols assist high level protocols in attaining:

Transmission reliability

Selective and logical addressing

presented by kunmun garabadu & roney philip r eal– t ime c ommunication - paulo verissimo

Documents

real time networks lan

real time behavior

realtime communication

properties of lan

b backward recovery

lan interfacing

elementary lan

reliable realtime network