presented by kunmun garabadu & roney philip r eal– t ime c ommunication - paulo verissimo
Post on 21-Dec-2015
219 views
TRANSCRIPT
Presented byKunmun Garabadu & Roney Philip
Real–Time Communica
tion-Paulo Verissimo
Real time communication
To achieve real-time communication:Real time protocols
Real time networks - timely and reliable
Characteristics of real time communicationKnown and bounded msg delivery
Deterministic behavior in the presence of disturbing factors
Recognition of latency classes
Connectivity
Real time networks
LAN or MAN
LAN Small scale
Reliable to very reliable
Span a few 1000 ms
Round trip times 10-5 to 10-1 secs
Reliability StrategiesFaults lead to:
Lost messages
Delays
Corrupted contents
Solution:
Space redundancy - replicated hardware• Mandatory for critical systems like flight control
Time redundancy - message repetition
Reliability StrategiesSpace redundancy Cons• High cost of hardware• Complex
Time redundancy Cons • Communication reliability low for real-time
applications
Which methods and techniques to use?Ask 2 questions1. Can we reliably obtain real time behavior out of
simplex( non- replicated) networks?2. Which protocols and QoS to use?
Reliability StrategiesSolution to 1
Combination of simplex standard LANsSpace redundancy in physical layer
• To maintain connectivity
Protocol time redundancy• Protocols see only one LAN controller
Solution to 2For reliability of communication
• Error masking• Error detection and forward recovery• Error detection and backward recovery
Error masking
Assume bounded number of failures, say k, from a particular component Have more than k channelsHave more than k transmissions Mask k failures
a) space redundancy b) time redundancy
Error detection: Forward recovery
For periodic real time communicationRelationship between consecutive measurementsPossible to skip a lost msgWait for the next msg
V(t1)use
previous value
k = 1
1 2 3
Maximum period without refreshing
a) Forward recovery
refreshed V(t3)
Error detection: Backward recovery
Ack based protocol
Restarts when a msg is lost
Appropriate when msgs cannot be lost
Timeoutk = 1
b) Backward recovery
Making real-time LANs reliableLANs have to display real-time behavior
Obtained by:Establishing a model
• Traffic patterns
• Reliability and timeliness requirements
• Failure assumptions
Service and interface definition
Dressing the elementary LAN with hardware and software to comply with requirements
Abstract LAN ModelWe need LAN interfacing to be LAN independentStandardisation bodies achieved this through LLCBut no services in LLC aims at real-time, reliability etcSo we devise a complete model overcoming these problemsUsing some of the properties of LAN to implement protocols
Abstract LAN Properties
An1 – Broadcast
An2 – Error Detection
An3 – Network Order
An4 – Full Duplex
An5 – Tightness
An6 – Bounded Transmission Delay
An7 – Bounded Omission Degree
An8 – Bounded Inaccessibility
Real time communication requirements
LAN components display following failures:
Timing failuresOmission failuresNetwork partitions
Definition of reliable real time networkRT- “A reliable real-time network displays bounded and known message delivery delay, in the presence of disturbing factors such as overload or faults”
Real time communication requirements
Some networks recognize urgency
Urgency classesCritical or hard real-time
Best-effort or soft real-time
Background or non real-time
Solution to real-time communication requirements
Enforce bounded delay from request to transmission of a frame given the worst case conditions assumed (avoid timing failures)
Ensure that a message is delivered despite the occurrence of omissions (tolerate omission failures)
Maintain connectivity (control partitions)
Enforcing Bounded Transmission Delay
An6 not guaranteed
Factors to take into account:Traffic patterns
Latency classes
LAN sizing and parametrising
User-level load/flow control
Traffic patterns
Designer must model the traffic offered to the network
Aperiodic trafficNo guarantees about transmission delays
Cyclic traffic – defined by period
Sporadic traffic – bursty
Latency classes
Traffic separation in latency classes
Highest criticality traffic should be given lowest latency class
Should be given certain amount of channel bandwidth to fulfill latency requirements
Enforce a given transmission time bound for every sender
LAN sizing and parametrising
LAN sized and parametrised to comply with aimed bound or vice-versa
Aimed latency not achievable with offered load
Consequences• Latency goes up
• number of nodes and/or their offered load go down
• Sending node reduces its traffic demands
Iterative procedure
User level load/flow control
Flow based load control delays transmissionsRole of real-time load control
Regulate global offered loadThrottle individual traffic
Sporadic event class has bound forInterarrival rateBurst lengthBurst rate
Average interarrival time
Minimum interarrival time
Fig: Timing pattern of sporadic events
Burst length
Burst period
User level load/flow control
Rate based flow controlCalculate average interarrival rate
Manipulate the rate at which data is sent
Smoothens the bursty nature
Rate should not go smaller than average interarrival rate
User level load/flow control
Load control mechanismsRate control
• Suited for periodic and sporadic traffic• Matches senders and recipients capabilities• No discontinuities in traffic flow
Credit control• Allocates recipients some credits• When credit is over, recipient refuses to accept more
information• Improved scheme – look ahead credit request or
supply
Handling Omission FailuresCharacterstics of omissions in a LAN:
Omissions are rare.
They can occur in bursts.
Are usually the result of failure of a single component.
Omission Degree : It is the number of consecutive omissions produced by a component.
An7 : Bounded Omissions Degree. In a known interval Trd, omission errors may affect at most k transmissions. This feature serves as the foundation of basic error processing protocols with deterministic termination. This is important for real time operation.
Transmission-With-Replytries := 0; resp := empty;do tries < nrTries ^ resp != full -> resp := empty; Tx(data, id);
waitRepliesPutInBag(TwaitReply,resp);
tries :=tries + 1;od
Diffusion
tries := 0;
do tries < nrTries ->
Tx(data, id);
tries :=tries + 1;
od
Tx-with reply Optimal for average case where error rate is expected to
be low Only one try in absence of errors Identifier id allows to distinguish between duplicate
messages. It aims for a completely correct series It allows for complete order among competing LAN
transmissions.
Diffusion At least one instance of the message reaches every node
It repeats transmission k+ 1 times.
Both algorithms execute within a bounded time in absence of partitions
Comparision of Algorithms
Features Tx-with Reply Diffusion
Worst-case delivery delay
k.TwaitReply + Ttd
(k+1).Ttd
No fault delivery delay
equal equal
Processing overhead
highest
Scalability equal equal
Network load highest
Comparision of Algorithms
Features Tx-with Reply Diffusion
Total order possible not possible
Failure Detection yes no
Upper layer inform in reply frame
possible not applicable
Resilence to lack of coverage
high none
Processing overhead
highest
InaccessibilityRT: Maintain connectivity An8 : Bounded Inaccessibility. In a known interval Trd, the network
may be inaccessible at most i times with a total duration of at most Tina.
Network is partitioned into subsets of nodes that cannot communicate.
Causes of partition : bus medium failure, ring disruption, transmitter or receiver defects, token loss etc.
Controlling partition : Solution is in knowing how long a partition lasts. This should be sufficiently small so that the service can be carried on effectively
Inaccessibility : Period of time for which the partition lasts.
Inaccessibility ControlHow to implement inaccessibility control ? Instrument the LAN to recover from all conditions leading to
partition Have a bound for number and duration of inaccessibility
periods Accommodate inaccessibility in the protocols and timeliness
calculations. Determine the upper bound for recovery from partitioning The upper bound may be dependant on operating situation
specific to each LAN. If network is properly managed and parameterised
inaccessibility figures can be drastically reduced.
Inaccessibility in Timeliness ModelInaccessibility must be accounted in the following :
Calculations of real worst case execution times Dimensioning of timeouts
Synchronous real-time operation of LAN: Tina has to be added to the real worst-case execution time of
protocols The protocol may fail if it times out too early but inaccessibility
occurs. Including Tina in time-outs is a sufficient condition for running
synchronous operation Tina may be much greater than Ttd causing timeouts to be
undesirably long.
Better to take inaccessibility off from the time-outsMethods to remove inaccessibility :
Timer Freezing : Inaccessibility is detected All timers used in time-outs are suspended Timers are restarted when the network becomes accessible Inaccessibility Trapping : Each inaccessibility period inside two consecutive transmission signals from
the LAN are trapped This avoids more than one timeout per inaccessibility period.
Each inaccessibility occurrence counts as one omission. Extra omissions have to be added in the retry count of the low level
protocols.
LAN RedundancyEnforcement of bounded omission degree and bounded
inaccessibility can be obtained through redundancy in the physical and medium layers FDDI has a dual-reconfiguring ring capable of
surviving just one interruption. Token-bus and Ethernet have no standardised
redundancy. Extra measures have to be implemented to survive
multiple failures.
Dual Media Token Bus LANHigher-level protocols
Medium-Access Control VLSI
Selector State Machines
Physical layer Physical layer
Dual Media Token Bus LAN
Addressing
Efficient and timely to meet real-time requirements. Reception of frames not addressed to anyone in the node has
to be avoided
Frame addressing involves the following : Construction of the address at frame transmission Interpretation of the address of the passing or received frame Address formats correspond to (type;addressing mode) Type performs the first step in selection ; it points to a set of
possible filters Mode selects the appropriate filter.
AddressingClassification of several addressing modes :
Individual : It enables a sender to address a particular station by its physical address.
Broadcast : It enables a frame to be accepted in all nodes.
Logical : It is intended to address a given group of nodes identified by a n-bit gate address independent of their location and number.
Selective : It consists of a n-bit binary chain but each of the bits represents a node. The association between a station and a bit can be static or dynamic.
Processor Group Membership It provides a map of the nodes belonging to the group. It is independent of higher level groupings of processes. It maintains an Active Stations Table (AST)
AST provides the station ordering and a basic mask where
stations are marked “up” or “down”
ST1 ST2 ST3 ST4
up up up down
Processor Group MembershipCategories of events that PGM responds to : Insert/Delete, Join/Leave, FailurePGM functions : Maintenance of AST : Responds to insert/delete requests Provision of Short Addresses : Reference a node by its positionin the
AST Failure and Group Change Handling : Acts upon suspicion of failure
that may come from a network driver, group communication protocol etc
Information about group members : Can respond to a number of requests regarding group members.
Clockless PGM ProtocolDelta-4 System A GroupChangeEvent for join,leave or failure cases triggers the
protocol. In case of failure, a component detecting failure issues the check
request. The node requests the other members’ state. The node gets replies and constructs the new AST. It sends it out to
members. This is done using Tx-with-Reply to make sure all members install the new table.
The first message locks the table so that competitors are left out With omissions more than one competitor may lock subsets of the
nodes Each of them retries incrementing a lock_level counter until one of
them locks all nodes successfully and then proceeds
Clockless PGM Protocol
My state Installed
GetState(and lock)
Compute station table NewState
(unlock)
Group change event
a) StationTableOps: Insert, Delete, Down, Up
Clock-driven PGM ProtocolAAS System Two events trigger the protocol : Upon request like join or
passage of time Periodically membership management is done to ensure
changes are detected in bounded time Group communication is through diffusion. Only way to
detect failures is through such a protocol. All processors diffuse an “I’m alive” message so that each
and everyone will build the same view of processors alive.
Time-Triggered PGM ProtocolMARS System
Periodically all nodes broadcast their message Each message is sent twice to overcome omission Each processor listens to all transmissions making a vector of
dimension N, where N is the number of nodes. Vu,v is a boolean which is true when processor u saw a valid message from processor v
Vector V is then sent in the following period transmission.All processors receive N vectors
A matrix is built which is as follows: Each column u accounts for the messages Pu saw from all others Each row v accounts for the messages from Pv seen by all the others
Time-Triggered PGM Protocol This protocol detects failures with one cycle delay at most. Matrices may not be equal in all nodes.They guarantee to have enough
information to deterministically detect a failed processor. A failed processor is one that fails to transmit both copies of its
message to all or fails to receive both copies of another node’s message
[ ]P1 * V2,1 V3,1 V4,1P2 V1,2 * V3,2 V4,2P3 V1,3 V2,3 * V4,3P4 V1,4 V2,4 V3,4 * [ ]
SummaryReal time communication
Real time networksReal time protocols
Real time networking and reliability policiesMaking real-time networks reliable and timely
Bounded transmission delaysHandling failuresInaccessibility
Summary
Low level protocols assist high level protocols in attaining:
Transmission reliability
Selective and logical addressing