1outline u part 3: models of computation s fsms s discrete event systems s cfsms s data flow models...
Post on 19-Dec-2015
216 views
TRANSCRIPT
1
OutlineOutline
Part 3: Models of ComputationPart 3: Models of ComputationFSMsFSMs
Discrete Event SystemsDiscrete Event Systems
CFSMsCFSMs
Data Flow ModelsData Flow Models
Petri Nets Petri Nets
The Tagged Signal ModelThe Tagged Signal Model
2
Discrete EventDiscrete Event
Explicit notion of time (global order…)Explicit notion of time (global order…)
Events can happen at any time asynchronouslyEvents can happen at any time asynchronously
As soon as an input appears at a block, it may be executedAs soon as an input appears at a block, it may be executed
The execution may take non zero time, the output is marked with a The execution may take non zero time, the output is marked with a
time that is the sum of the arrival time plus the execution timetime that is the sum of the arrival time plus the execution time
Time determines the order with which events are processedTime determines the order with which events are processed
DE simulator maintains a global event queue (Verilog and VHDL)DE simulator maintains a global event queue (Verilog and VHDL)
DrawbacksDrawbacks global event queue => tight coordination between partsglobal event queue => tight coordination between parts Simultaneous events => non-deterministic behaviorSimultaneous events => non-deterministic behavior Some simulators use delta delay to prevent non-determinacySome simulators use delta delay to prevent non-determinacy
3
Simultaneous Events in DE Simultaneous Events in DE
AA BB CCtt
tt
Fire B or C?Fire B or C?
AA BB CC
tt
AA BB CC
tt
tt
B has 0 delayB has 0 delay B has delta delayB has delta delay
Fire C once? or twice?Fire C once? or twice?
t+t+
Fire C twice.Fire C twice.
Still have problem with 0-delay Still have problem with 0-delay (causality) loop(causality) loop
Can be refinedCan be refined
E.g. introduce timing constraintsE.g. introduce timing constraints
(minimum reaction time 0.1 s)(minimum reaction time 0.1 s)
4
OutlineOutline
Part 3: Models of ComputationPart 3: Models of ComputationFSMsFSMs
Discrete Event Systems Discrete Event Systems
CFSMsCFSMs
Data Flow ModelsData Flow Models
Petri Nets Petri Nets
The Tagged Signal ModelThe Tagged Signal Model
5
Co-Design Finite State Machines:Co-Design Finite State Machines:Combining FSM and Discrete EventCombining FSM and Discrete Event
Synchrony and asynchronySynchrony and asynchrony
CFSM definitionsCFSM definitionsSignals & networksSignals & networks
Timing behaviorTiming behavior
Functional behaviorFunctional behavior
CFSM & process networksCFSM & process networks
Example of CFSM behaviorsExample of CFSM behaviorsEquivalent classesEquivalent classes
6
Codesign Finite State MachineCodesign Finite State Machine
Underlying MOC of Polis and VCCUnderlying MOC of Polis and VCC
Combine aspects from several other MOCsCombine aspects from several other MOCs
Preserve formality and efficiency in implementationPreserve formality and efficiency in implementation
Mix Mix synchronicitysynchronicity
zero and infinite timezero and infinite time
asynchronicityasynchronicity non-zero, finite, and bounded timenon-zero, finite, and bounded time
Embedded systems often contain both aspectsEmbedded systems often contain both aspects
7
Synchrony: Basic OperationSynchrony: Basic Operation
Synchrony is often implemented with clocksSynchrony is often implemented with clocks
At clock ticksAt clock ticksModule reads inputs, computes, and produce outputModule reads inputs, computes, and produce output
All synchronous events happen simultaneouslyAll synchronous events happen simultaneously
Zero-delay computationsZero-delay computations
Between clock ticksBetween clock ticks Infinite amount of time passedInfinite amount of time passed
8
Synchrony: Basic Operation (2)Synchrony: Basic Operation (2)
Practical implementation of synchronyPractical implementation of synchrony Impossible to get zero or infinite delayImpossible to get zero or infinite delay
Require: computation time <<< clock periodRequire: computation time <<< clock period
Computation time = 0, w.r.t. reaction time of environmentComputation time = 0, w.r.t. reaction time of environment
Feature of synchronyFeature of synchronyFunctional behavior independent of timingFunctional behavior independent of timing
Simplify verificationSimplify verification
Cyclic dependencies may cause problemCyclic dependencies may cause problem Among (simultaneous) synchronous eventsAmong (simultaneous) synchronous events
9
Synchrony: Synchrony: Triggering and OrderingTriggering and Ordering
All modules are triggered at each clock tickAll modules are triggered at each clock tick
Simultaneous signalsSimultaneous signalsNo a priori orderingNo a priori ordering
Ordering may be imposed by dependenciesOrdering may be imposed by dependencies Implemented with delta stepsImplemented with delta steps
computation
continuous time
ticks
delta steps
10
Synchrony: Synchrony: System SolutionSystem Solution System solutionSystem solution
Output reaction to a set of inputsOutput reaction to a set of inputs
Well-designed system:Well-designed system: Is completely specified and functionalIs completely specified and functionalHas an unique solution at each clock tickHas an unique solution at each clock tick Is equivalent to a single FSMIs equivalent to a single FSMAllows efficient analysis and verificationAllows efficient analysis and verification
Well-designed-ness Well-designed-ness May need to be checked for each design (Esterel)May need to be checked for each design (Esterel)
Cyclic dependency among simultaneous events Cyclic dependency among simultaneous events
11
Synchrony: Synchrony: Implementation CostImplementation Cost
Must verify synchronous assumption on final designMust verify synchronous assumption on final designMay be expensiveMay be expensive
Examples:Examples:HardwareHardware
Clock cycle > maximum computation timeClock cycle > maximum computation time Inefficient for average case
Software Software Process must finish computation beforeProcess must finish computation before
New input arrival Another process needs to start computation
12
Pure Asynchrony: Pure Asynchrony: Basic OperationBasic Operation
Events are never simultaneousEvents are never simultaneousNo two events have the same tagNo two events have the same tag
Computation starts at a change of the inputComputation starts at a change of the input
Delays are arbitrary, but boundedDelays are arbitrary, but bounded
13
Asynchrony: Asynchrony: Triggering and OrderingTriggering and Ordering
Each module is triggered to run at a change of inputEach module is triggered to run at a change of input
No a priori ordering among triggered modulesNo a priori ordering among triggered modulesMay be imposed by scheduling at implementationMay be imposed by scheduling at implementation
14
Asynchrony: Asynchrony: System SolutionSystem Solution
Solution strongly dependent on input timingSolution strongly dependent on input timing
At implementationAt implementationEvents may “appear” simultaneousEvents may “appear” simultaneous
Difficult/expensive to maintain total orderingDifficult/expensive to maintain total ordering Ordering at implementation decides behaviorOrdering at implementation decides behavior Becomes DE, with the same pitfallsBecomes DE, with the same pitfalls
15
Asynchrony: Asynchrony: Implementation CostImplementation Cost
Achieve low computation time (average)Achieve low computation time (average)Different parts of the system compute at different ratesDifferent parts of the system compute at different rates
Analysis is difficultAnalysis is difficultBehavior depends on timingBehavior depends on timing
Maybe be easier for designs that are insensitive to Maybe be easier for designs that are insensitive to Internal delayInternal delay External timingExternal timing
16
Asynchrony vs. Synchrony in System DesignAsynchrony vs. Synchrony in System Design
They are different at least atThey are different at least atEvent bufferingEvent buffering
Timing of event read/writeTiming of event read/write
AsynchronyAsynchronyExplicit buffering of events for each moduleExplicit buffering of events for each module
Vary and unknown at start-timeVary and unknown at start-time
SynchronySynchronyOne global copy of eventOne global copy of event
Same start time for all modulesSame start time for all modules
17
Combining Combining Synchrony and AsynchronySynchrony and Asynchrony
Wants to combineWants to combineFlexibility of asynchrony Flexibility of asynchrony
Verifiability of synchrony Verifiability of synchrony
AsynchronyAsynchronyGlobally, a timing independent style of thinking Globally, a timing independent style of thinking
SynchronySynchronyLocal portion of design are often tightly synchronizedLocal portion of design are often tightly synchronized
Globally asynchronous, locally synchronousGlobally asynchronous, locally synchronousCFSM networksCFSM networks
18
CFSM OverviewCFSM Overview
CFSM is FSM extended withCFSM is FSM extended withSupport for data handlingSupport for data handling
Asynchronous communicationAsynchronous communication
CFSM hasCFSM hasFSM partFSM part
Inputs, outputs, states, transition and output relationInputs, outputs, states, transition and output relation
Data computation partData computation part External, instantaneous functionsExternal, instantaneous functions
19
CFSM Overview (2)CFSM Overview (2)
CFSM has:CFSM has:Locally synchronous behaviorLocally synchronous behavior
CFSM executes based on snap-shot input assignmentCFSM executes based on snap-shot input assignment Synchronous from its own perspectiveSynchronous from its own perspective
Globally asynchronous behaviorGlobally asynchronous behavior CFSM executes in non-zero, finite amount of timeCFSM executes in non-zero, finite amount of time Asynchronous from system perspectiveAsynchronous from system perspective
GALS modelGALS modelGlobally: Scheduling mechanismGlobally: Scheduling mechanism
Locally: CFSMsLocally: CFSMs
12/09/1999 20
Network of CFSMs: Depth-1 BuffersNetwork of CFSMs: Depth-1 Buffers
CFSM2
CFSM3
C=>G
CFSM1
C=>FB=>C
F^(G==1)
(A==0)=>B
C=>ACFSM1 CFSM2
C=>B
F
G
C
C
BA
C=>G
C=>B
Globally Asynchronous, Locally Synchronous (GALS)
model
21
Introducing a CFSMIntroducing a CFSM
A Finite State MachineA Finite State Machine
Input events, output events and Input events, output events and statestate events events
Initial values (for state events)Initial values (for state events)
A transition functionA transition functionTransitions may involve Transitions may involve complex, memory-less, instantaneouscomplex, memory-less, instantaneous
arithmetic and/or Boolean functionsarithmetic and/or Boolean functions
All the state of the system is under form of eventsAll the state of the system is under form of events
Need rules that define the CFSM behaviorNeed rules that define the CFSM behavior
22
CFSM Rules: phasesCFSM Rules: phases
Four-phase cycle:Four-phase cycle: IdleIdle Detect input eventsDetect input events Execute one transitionExecute one transition Emit output eventsEmit output events
Discrete timeDiscrete timeSufficiently accurate for synchronous systemsSufficiently accurate for synchronous systemsFeasible formal verificationFeasible formal verification
Model semantics: Model semantics: Timed Traces Timed Traces i.e. sequences of events i.e. sequences of events labeled by time of occurrencelabeled by time of occurrence
23
CFSM Rules: phasesCFSM Rules: phases
Implicit Implicit unbounded delayunbounded delay between phases between phases
Non-zeroNon-zero reaction time reaction time (avoid (avoid inconsistenciesinconsistencies when interconnected) when interconnected)
Causal Causal model based on model based on partial orderpartial order (global asynchronicity)(global asynchronicity)potential verification speed-uppotential verification speed-up
Phases Phases may not overlapmay not overlap
Transitions always Transitions always clear input buffersclear input buffers(local synchronicity)(local synchronicity)
24
Communication PrimitivesCommunication Primitives
SignalsSignalsCarry information in the form of events and/or valuesCarry information in the form of events and/or values
Event signals: present/absenceEvent signals: present/absence Data signals: arbitrary valuesData signals: arbitrary values
Event, data may be paired
Communicate between two CFSMsCommunicate between two CFSMs 1 input buffer / signal / receiver1 input buffer / signal / receiver
Emitted by a sender CFSMEmitted by a sender CFSM
Consumed by a receiver CFSM by setting buffer to 0Consumed by a receiver CFSM by setting buffer to 0
““Present” if emitted but not consumedPresent” if emitted but not consumed
25
Communication Primitives (2)Communication Primitives (2)
Input assignmentInput assignmentA set of values for the input signals of a CFSMA set of values for the input signals of a CFSM
Captured input assignmentCaptured input assignmentA set of input values read by a CFSM at a particular timeA set of input values read by a CFSM at a particular time
Input stimulusInput stimulus Input assignment with at least one event presentInput assignment with at least one event present
26
Signals and CFSMSignals and CFSM
CFSMCFSM Initiates communication through eventsInitiates communication through events
Reacts only to input stimulusReacts only to input stimulus except initial reactionexcept initial reaction
Writes data first, then emits associated eventWrites data first, then emits associated event
Reads event first, then reads associated dataReads event first, then reads associated data
27
CFSM networksCFSM networks
NetNetA set of connections on the same signalA set of connections on the same signalAssociated with single sender and multiple receiversAssociated with single sender and multiple receiversAn input buffer for each receiver on a netAn input buffer for each receiver on a net
Multi-cast communicationMulti-cast communication
Network of CFSMsNetwork of CFSMsA set of CFSMs, nets, and a scheduling mechanismA set of CFSMs, nets, and a scheduling mechanismCan be implemented asCan be implemented as
A set of CFSMs in SW (program/compiler/OS/uC)A set of CFSMs in SW (program/compiler/OS/uC) A set of CFSMs in HW (HDL/gate/clocking)A set of CFSMs in HW (HDL/gate/clocking) Interface (polling/interrupt/memory-mapped)Interface (polling/interrupt/memory-mapped)
28
Scheduling MechanismScheduling Mechanism
At the specification levelAt the specification levelShould be as abstract as possible to allow optimizationShould be as abstract as possible to allow optimization
Not fixed in any way by CFSM MOCNot fixed in any way by CFSM MOC
May be implemented asMay be implemented asRTOS for single processorRTOS for single processor
Concurrent execution for HWConcurrent execution for HW
Set of RTOSs for multi-processorSet of RTOSs for multi-processor
Set of scheduling FSMs for HWSet of scheduling FSMs for HW
29
Timing Behavior Timing Behavior
Scheduling MechanismScheduling MechanismGlobally controls the interaction of CFSMsGlobally controls the interaction of CFSMs
Continually deciding which CFSMs can be executedContinually deciding which CFSMs can be executed
CFSM can beCFSM can be IdleIdle
Waiting for input eventsWaiting for input events Waiting to be executed by schedulerWaiting to be executed by scheduler
ExecutingExecuting Generate a single reactionGenerate a single reaction Reads its inputs, computes, writes outputsReads its inputs, computes, writes outputs
30
Timing Behavior: Mathematical ModelTiming Behavior: Mathematical Model
Transition PointTransition PointPoint in time a CFSM starts executingPoint in time a CFSM starts executing
For each executionFor each execution Input signals are read and clearedInput signals are read and cleared
Partial order between input and outputPartial order between input and output
Event is read before dataEvent is read before data
Data is written before event emissionData is written before event emission
31
Timing Behavior: Transition PointTiming Behavior: Transition Point
A transition point tA transition point tii
Input may be read between tInput may be read between tii and t and ti+1i+1
Event that is read may have occurred between tEvent that is read may have occurred between ti-1i-1 and t and ti+1i+1
Data that is read may have occurred between tData that is read may have occurred between t00 and t and ti+1i+1
Outputs are written between tOutputs are written between tii and t and ti+1i+1
CFSM allow loose synchronization of event & dataCFSM allow loose synchronization of event & dataLess restrictive implementationLess restrictive implementation
May lead to non intuitive behaviorMay lead to non intuitive behavior
32
Event/Data SeparationEvent/Data Separation
Sender S
Receiver R
t1ti-1 t2 ti t3 t4 ti+1
Read Event Read Value
Write v1 Emit Write v2 Emit
Value v1 is lost even thoughValue v1 is lost even though It is sent with an eventIt is sent with an event
Event may not be lostEvent may not be lost
Need atomicityNeed atomicity
33
AtomicityAtomicity
Group of actions considered as a single entityGroup of actions considered as a single entity
May be costly to implementMay be costly to implement
Only atomicity requirement of CFSMOnly atomicity requirement of CFSM Input events are read atomicallyInput events are read atomically
Can be enforced in SW (bit vector) HW (buffer)Can be enforced in SW (bit vector) HW (buffer) CFSM is guaranteed to see a snapshot of input eventsCFSM is guaranteed to see a snapshot of input events
Non-atomicity of event and dataNon-atomicity of event and dataMay lead to undesirable behaviorMay lead to undesirable behavior
Atomicized as an implementation trade-off decisionAtomicized as an implementation trade-off decision
34
Non Atomic Data Value ReadingNon Atomic Data Value Reading
Receiver R1 gets (X=4, Y=5), R2 gets (X=5 Y=4)Receiver R1 gets (X=4, Y=5), R2 gets (X=5 Y=4)
X=4 Y=5 never occursX=4 Y=5 never occurs
Can be remedied if values are sent with eventsCan be remedied if values are sent with events still suffers from separation of data and eventstill suffers from separation of data and event
Sender S
Receiver R1
t1 t2 t3 t4 t5 t6
Receiver R2
X:=4Y:=4 X:=5 Y:=5
Read X
Read X Read Y
Read Y
35
Atomicity of Event ReadingAtomicity of Event Reading
R1 sees no events, R2 sees X, R3 sees X, YR1 sees no events, R2 sees X, R3 sees X, Y
Each sees a snapshot of events in timeEach sees a snapshot of events in time
Different captured input assignmentDifferent captured input assignmentBecause of scheduling and delayBecause of scheduling and delay
Sender S
Receiver R1
t1 t2 t3 t4 t5
Receiver R2
Receiver R3
Emit X Emit Y
Read
Read
Read
36
Functional BehaviorFunctional Behavior
Transition and output relationsTransition and output relations input, present_state, next_state, outputinput, present_state, next_state, output
At each execution, a CFSMAt each execution, a CFSMReads a captured input assignmentReads a captured input assignment
If there is a match in transition relationIf there is a match in transition relation consume inputs, transition to next_state, write outputsconsume inputs, transition to next_state, write outputs
OtherwiseOtherwise consume no inputs, no transition, no outputsconsume no inputs, no transition, no outputs
37
Functional Behavior (2)Functional Behavior (2)
Empty TransitionEmpty TransitionNo matching transition is foundNo matching transition is found
Trivial TransitionTrivial TransitionA transition that has no output and no state changesA transition that has no output and no state changesEffectively throw away inputsEffectively throw away inputs
Initial transitionInitial transitionTransition to the init (reset) stateTransition to the init (reset) stateNo input event needed for this transitionNo input event needed for this transition
38
CFSM and Process NetworksCFSM and Process Networks
CFSMCFSMAn asynchronous extended FSM modelAn asynchronous extended FSM model
Communication via bounded non-blocking buffersCommunication via bounded non-blocking buffers Versus CSP and CCS (rendezvous)Versus CSP and CCS (rendezvous) Versus SDL (unbounded queue & variable topology)Versus SDL (unbounded queue & variable topology)
Not continuous in Kahn’s senseNot continuous in Kahn’s sense Different event ordering may change behaviorDifferent event ordering may change behavior
Versus dataflow (ordering insensitive)
39
CFSM NetworksCFSM Networks
Defined based on a global notion of timeDefined based on a global notion of timeTotal order of eventsTotal order of events
Synchronous with relaxed timingSynchronous with relaxed timing Global consistent state of signals is requiredGlobal consistent state of signals is required Input and output are in partial orderInput and output are in partial order
40
Buffer OverwriteBuffer Overwrite
CFSM Network hasCFSM Network hasFinite BufferingFinite Buffering
Non-blocking writeNon-blocking write Events can be overwrittenEvents can be overwritten
if the sender is “faster” than receiver
To ensure no overwriteTo ensure no overwriteExplicit handshaking mechanismExplicit handshaking mechanism
SchedulingScheduling
41
Example of CFSM BehaviorsExample of CFSM Behaviors
A and B produce i1 and i2 at every iA and B produce i1 and i2 at every i
C produce C produce errerr or or oo at every i1,i2 at every i1,i2
Delay (Delay (ii to to oo) for normal operation is n) for normal operation is nrr, , errerr operation 2n operation 2nrr
Minimum input interval is nMinimum input interval is nii
Intuitive “correct” behaviorIntuitive “correct” behavior No events are lostNo events are lost
A
B
Ci
i1
i2
err o
42
Equivalent Classes of CFSM BehaviorEquivalent Classes of CFSM Behavior
Assume parallel execution (HW, 1 CFSM/processor)Assume parallel execution (HW, 1 CFSM/processor)
Equivalent classes of behaviors are:Equivalent classes of behaviors are:Zero DelayZero Delay
nnrr= 0= 0
Input buffer overwriteInput buffer overwrite nniinnrr
Time critical operationTime critical operation nnii/2/2nnrrnnii
Normal operation Normal operation nnrrnnii/2/2
43
Equivalent Classes of CFSM Behavior (2)Equivalent Classes of CFSM Behavior (2)
Zero delay: nZero delay: nrr= 0 = 0 If C emits an error on some inputIf C emits an error on some input
A, B can react instantaneously & output differentlyA, B can react instantaneously & output differently
May be logically inconsistentMay be logically inconsistent
Input buffers overwrite: nInput buffers overwrite: niinnrr Execution delay of A, B is larger than arrival intervalExecution delay of A, B is larger than arrival interval
always loss of event always loss of event requirements not satisfiedrequirements not satisfied
44
Equivalent Classes of CFSM Behavior (3)Equivalent Classes of CFSM Behavior (3)
Time critical operation: nTime critical operation: nii/2/2nnrrnnii
Normal operation results in no loss of eventNormal operation results in no loss of event
Error operation may cause lost inputError operation may cause lost input
Normal operation: nNormal operation: nrrnnii/2/2No events are lostNo events are lost
May be expensive to implementMay be expensive to implement
If error is infrequentIf error is infrequentDesigner may accept also time critical operationDesigner may accept also time critical operation
Can result in lower-cost implementationCan result in lower-cost implementation
45
Equivalent Classes of CFSM Behavior (4)Equivalent Classes of CFSM Behavior (4)
Implementation on a single processorImplementation on a single processorLoss of Event may be caused byLoss of Event may be caused by
Timing constraintsTiming constraints ni<3nr
Incorrect schedulingIncorrect scheduling If empty transition also takes nr
• ACBC round robin will miss event• ABC round robin will not
46
Some Possibility of Equivalent ClassesSome Possibility of Equivalent Classes
Given 2 arbitrary implementations, 1 input stream:Given 2 arbitrary implementations, 1 input stream:Dataflow equivalenceDataflow equivalence
Output streams are the same orderingOutput streams are the same ordering
Petri net equivalencePetri net equivalence Output streams satisfy some partial orderOutput streams satisfy some partial order
Golden model equivalenceGolden model equivalence Output streams have the same orderingOutput streams have the same ordering
Except reordering of concurrent events One of the implementations is a reference specificationOne of the implementations is a reference specification
Filtered equivalenceFiltered equivalence Output streams are the same after filtered by observerOutput streams are the same after filtered by observer
47
ConclusionConclusion
CFSMCFSMExtension: ACFSM: Initially unbounded FIFO buffersExtension: ACFSM: Initially unbounded FIFO buffers
Bounds on buffers are imposed by refinement to yield ECFSMBounds on buffers are imposed by refinement to yield ECFSM
Delay is also refined by implementationDelay is also refined by implementation
Local synchronyLocal synchrony Relatively large atomic synchronous entitiesRelatively large atomic synchronous entities
Global asynchronyGlobal asynchrony Break synchrony, no compositional problemBreak synchrony, no compositional problem Allow efficient mapping to heterogeneous architecturesAllow efficient mapping to heterogeneous architectures
48
OutlineOutline
Part 3: Models of ComputationPart 3: Models of ComputationFSMsFSMs
Discrete Event Systems Discrete Event Systems
CFSMsCFSMs
Data Flow ModelsData Flow Models
Petri Nets Petri Nets
The Tagged Signal ModelThe Tagged Signal Model
49
Data-flow networksData-flow networks
A bit of historyA bit of history
Syntax and semanticsSyntax and semantics actors, tokens and firingsactors, tokens and firings
Scheduling of Static Data-flowScheduling of Static Data-flow static schedulingstatic scheduling
code generationcode generation
buffer sizingbuffer sizing
Other Data-flow modelsOther Data-flow models Boolean Data-flowBoolean Data-flow
Dynamic Data-flowDynamic Data-flow
50
Data-flow networksData-flow networks
Powerful formalism for data-dominated system specificationPowerful formalism for data-dominated system specification
Partially-ordered model (no over-specification)Partially-ordered model (no over-specification)
Deterministic execution independent of schedulingDeterministic execution independent of scheduling
Used forUsed for simulationsimulation
schedulingscheduling
memory allocationmemory allocation
code generationcode generation
for Digital Signal Processors (HW and SW)for Digital Signal Processors (HW and SW)
51
A bit of historyA bit of history
Karp computation graphs (‘66): seminal work Karp computation graphs (‘66): seminal work
Kahn process networks (‘58): formal modelKahn process networks (‘58): formal model
Dennis Data-flow networks (‘75): programming language for MIT DF machineDennis Data-flow networks (‘75): programming language for MIT DF machine
Several recent implementationsSeveral recent implementations graphical:graphical:
Ptolemy (UCB), Khoros (U. New Mexico), Grape (U. Leuven)Ptolemy (UCB), Khoros (U. New Mexico), Grape (U. Leuven) SPW (Cadence), COSSAP (Synopsys)SPW (Cadence), COSSAP (Synopsys)
textual:textual: Silage (UCB, Mentor)Silage (UCB, Mentor) Lucid, HaskellLucid, Haskell
52
Data-flow network Data-flow network
A Data-flow network is a collection of A Data-flow network is a collection of functional nodesfunctional nodes
which are connected and communicate over which are connected and communicate over unbounded unbounded
FIFO queuesFIFO queues
Nodes are commonly called Nodes are commonly called actorsactors
The bits of information that are communicated over the The bits of information that are communicated over the
queues are commonly called queues are commonly called tokenstokens
53
Intuitive semanticsIntuitive semantics
(Often stateless) actors perform computation(Often stateless) actors perform computation
Unbounded FIFOs perform communication via Unbounded FIFOs perform communication via sequences of tokenssequences of tokens carrying values carrying values integer, float, fixed pointinteger, float, fixed point matrix of integer, float, fixed pointmatrix of integer, float, fixed point image of pixelsimage of pixels
State implemented as self-loop State implemented as self-loop
Determinacy: Determinacy: unique output sequences given unique input sequences unique output sequences given unique input sequences
Sufficient condition: Sufficient condition: blocking readblocking read(process cannot test input queues for emptiness)(process cannot test input queues for emptiness)
54
Intuitive semanticsIntuitive semantics
At each time, one actor is At each time, one actor is firedfired
When firing, actors When firing, actors consumeconsume input tokens and input tokens and produceproduce
output tokensoutput tokens
Actors can be fired only if there are enough tokens in the Actors can be fired only if there are enough tokens in the
input queuesinput queues
55
Intuitive semanticsIntuitive semantics
Example: FIR filterExample: FIR filtersingle input sequence i(n)single input sequence i(n)
single output sequence o(n)single output sequence o(n)
o(n) = c1 i(n) + c2 i(n-1) o(n) = c1 i(n) + c2 i(n-1)
* c1
+ o
i * c2
i(-1)
56
Intuitive semanticsIntuitive semantics
Example: FIR filterExample: FIR filtersingle input sequence i(n)single input sequence i(n)
single output sequence o(n)single output sequence o(n)
o(n) = c1 i(n) + c2 i(n-1) o(n) = c1 i(n) + c2 i(n-1)
* c1
+ o
i * c2
i(-1)
57
Intuitive semanticsIntuitive semantics
Example: FIR filterExample: FIR filtersingle input sequence i(n)single input sequence i(n)
single output sequence o(n)single output sequence o(n)
o(n) = c1 i(n) + c2 i(n-1) o(n) = c1 i(n) + c2 i(n-1)
* c1
+ o
i * c2
i(-1)
58
Intuitive semanticsIntuitive semantics
Example: FIR filterExample: FIR filtersingle input sequence i(n)single input sequence i(n)
single output sequence o(n)single output sequence o(n)
o(n) = c1 i(n) + c2 i(n-1) o(n) = c1 i(n) + c2 i(n-1)
* c1
+ o
i * c2
i(-1)
59
Intuitive semanticsIntuitive semantics
Example: FIR filterExample: FIR filtersingle input sequence i(n)single input sequence i(n)
single output sequence o(n)single output sequence o(n)
o(n) = c1 i(n) + c2 i(n-1) o(n) = c1 i(n) + c2 i(n-1)
* c1
+ o
i * c2
i(-1)
60
Intuitive semanticsIntuitive semantics
Example: FIR filterExample: FIR filtersingle input sequence i(n)single input sequence i(n)
single output sequence o(n)single output sequence o(n)
o(n) = c1 i(n) + c2 i(n-1) o(n) = c1 i(n) + c2 i(n-1)
* c1
+ o
i * c2
i(-1)
61
Intuitive semanticsIntuitive semantics
Example: FIR filterExample: FIR filtersingle input sequence i(n)single input sequence i(n)
single output sequence o(n)single output sequence o(n)
o(n) = c1 i(n) + c2 i(n-1) o(n) = c1 i(n) + c2 i(n-1)
* c1
+ o
i * c2
62
Intuitive semanticsIntuitive semantics
Example: FIR filterExample: FIR filtersingle input sequence i(n)single input sequence i(n)
single output sequence o(n)single output sequence o(n)
o(n) = c1 i(n) + c2 i(n-1) o(n) = c1 i(n) + c2 i(n-1)
* c1
+ o
i * c2
63
Intuitive semanticsIntuitive semantics
Example: FIR filterExample: FIR filtersingle input sequence i(n)single input sequence i(n)
single output sequence o(n)single output sequence o(n)
o(n) = c1 i(n) + c2 i(n-1) o(n) = c1 i(n) + c2 i(n-1)
* c1
+ o
i * c2
64
Intuitive semanticsIntuitive semantics
Example: FIR filterExample: FIR filtersingle input sequence i(n)single input sequence i(n)
single output sequence o(n)single output sequence o(n)
o(n) = c1 i(n) + c2 i(n-1) o(n) = c1 i(n) + c2 i(n-1)
* c1
+ o
i * c2
65
QuestionsQuestions
Does the order in which actors are fired affect the final Does the order in which actors are fired affect the final
result?result?
Does it affect the “operation” of the network in any way?Does it affect the “operation” of the network in any way?
Go to Radio Shack and ask for an unbounded queue!!Go to Radio Shack and ask for an unbounded queue!!
66
Formal semantics: sequencesFormal semantics: sequences
Actors operate from a Actors operate from a sequencesequence of input tokens to a of input tokens to a sequencesequence of of
output tokensoutput tokens
Let tokens be noted by xLet tokens be noted by x11, x, x22, x, x33, etc…, etc…
A A sequencesequence of tokens is defined as of tokens is defined as
X = [ xX = [ x11, x, x22, x, x33, …], …]
Over the execution of the network, each queue will grow a particular Over the execution of the network, each queue will grow a particular
sequence of tokenssequence of tokens
In general, we consider the actors mathematically as functions from In general, we consider the actors mathematically as functions from
sequences to sequences (not from tokens to tokens)sequences to sequences (not from tokens to tokens)
67
Ordering of sequencesOrdering of sequences
Let XLet X11 and X and X22 be two sequences of tokens. be two sequences of tokens.
We say that We say that XX11 is less than X is less than X22 if and only if (by definition) if and only if (by definition) XX11
is an initial segment of Xis an initial segment of X22
Homework: prove that the relation so defined is a partial Homework: prove that the relation so defined is a partial
order (reflexive, antisymmetric and transitive)order (reflexive, antisymmetric and transitive)
This is also called the This is also called the prefix orderprefix order
Example: [ xExample: [ x11, x, x2 2 ] <= [ x] <= [ x11, x, x22, x, x33 ] ]
Example: [ xExample: [ x11, x, x2 2 ] and [ x] and [ x11, x, x33, x, x44 ] are incomparable ] are incomparable
68
Chains of sequencesChains of sequences
Consider the set S of all finite and infinite sequences of Consider the set S of all finite and infinite sequences of tokenstokens
This set is partially ordered by the prefix orderThis set is partially ordered by the prefix order
A subset C of S is called a A subset C of S is called a chainchain iff all pairs of elements of iff all pairs of elements of C are C are comparablecomparable
If C is a chain, then it must be a If C is a chain, then it must be a linear orderlinear order inside S inside S (otherwise, why call it chain?)(otherwise, why call it chain?)
Example: { [ xExample: { [ x11 ], [ x ], [ x11, x, x2 2 ], [ x], [ x11, x, x22, x, x33 ], … } is a chain ], … } is a chain
Example: { [ xExample: { [ x11 ], [ x ], [ x11, x, x2 2 ], [ x], [ x11, x, x3 3 ], … } is not a chain ], … } is not a chain
69
(Least) Upper Bound(Least) Upper Bound
Given a subset Y of S, an Given a subset Y of S, an upper boundupper bound of Y is an element z of Y is an element z of S such that z is of S such that z is largerlarger than all elements of Y than all elements of Y
Consider now the set Z (subset of S) of all the upper Consider now the set Z (subset of S) of all the upper bounds of Ybounds of Y
If Z has a least element u, then u is called the If Z has a least element u, then u is called the least upper least upper boundbound (lub) of Y (lub) of Y
The least upper bound, if it exists, is unique The least upper bound, if it exists, is unique
Note: u might not be in Y (if it is, then it is the largest value Note: u might not be in Y (if it is, then it is the largest value of Y)of Y)
70
Complete Partial OrderComplete Partial Order
Every chain in S has a least upper boundEvery chain in S has a least upper bound
Because of this property, S is called a Because of this property, S is called a Complete Partial Complete Partial
OrderOrder
Notation: if C is a chain, we indicate the least upper bound Notation: if C is a chain, we indicate the least upper bound
of C by lub( C )of C by lub( C )
Note: the least upper bound may be thought of as the limit Note: the least upper bound may be thought of as the limit
of the the chainof the the chain
71
ProcessesProcesses
Process: function from a p-tuple of sequences to a q-tuple of Process: function from a p-tuple of sequences to a q-tuple of
sequencessequences
F : SF : Spp -> S -> Sqq
Tuples have the induced point-wise order:Tuples have the induced point-wise order:
Y = ( yY = ( y11, … , y, … , ypp ), Y’ = ( y’ ), Y’ = ( y’11, … , y’, … , y’pp ) in S ) in Spp :Y <= Y’ iff y :Y <= Y’ iff yii <= y’ <= y’ii for for
all 1 <= i <= pall 1 <= i <= p
Given a chain C in SGiven a chain C in Spp, F( C ) may or may not be a chain in S, F( C ) may or may not be a chain in Sqq
We are interested in conditions that make that trueWe are interested in conditions that make that true
72
Continuity and MonotonicityContinuity and Monotonicity
Continuity: F is continuous iff (by definition) for all chains C, lub( F( C ) ) Continuity: F is continuous iff (by definition) for all chains C, lub( F( C ) ) exists andexists and
F( lub( C ) = lub( F( C ) )F( lub( C ) = lub( F( C ) )
Similar to continuity in analysis using limitsSimilar to continuity in analysis using limits
Monotonicity: F is monotonic iff (by definition) for all pairs X, X’ Monotonicity: F is monotonic iff (by definition) for all pairs X, X’ X <= X’ => F( X ) <= F( X’ )X <= X’ => F( X ) <= F( X’ )
Continuity implies monotonicityContinuity implies monotonicity intuitively, outputs cannot be “withdrawn” once they have been producedintuitively, outputs cannot be “withdrawn” once they have been produced timeless causality. F transforms chains into chainstimeless causality. F transforms chains into chains
73
Least Fixed Point semanticsLeast Fixed Point semantics
Let X be the set of all sequencesLet X be the set of all sequences
A network is a mapping F from the sequences to the A network is a mapping F from the sequences to the
sequencessequences
X = F( X, I )X = F( X, I )
The behavior of the network is defined as the The behavior of the network is defined as the unique least unique least
fixed pointfixed point of the equation of the equation
If F is continuous then the least fixed point existsIf F is continuous then the least fixed point exists LFP = LFP =
LUB( { FLUB( { Fnn( ( , I ) : n >= 0 } ), I ) : n >= 0 } )
74
From Kahn networks to Data Flow networksFrom Kahn networks to Data Flow networks
Each process becomes an Each process becomes an actoractor: set of pairs of: set of pairs of firing rule firing rule
(number of required tokens on inputs)(number of required tokens on inputs)
function function
(including number of consumed and produced tokens) (including number of consumed and produced tokens)
Formally shown to be equivalent, but actors with firing are Formally shown to be equivalent, but actors with firing are
more intuitivemore intuitive
Mutually exclusiveMutually exclusive firing rules imply monotonicity firing rules imply monotonicity
Generally simplified to Generally simplified to blocking readblocking read
75
Examples of Data Flow actorsExamples of Data Flow actors
SDF: Synchronous (or, better, Static) Data FlowSDF: Synchronous (or, better, Static) Data Flow fixed input and output tokensfixed input and output tokens
BDF: Boolean Data FlowBDF: Boolean Data Flow control token determines consumed and produced tokenscontrol token determines consumed and produced tokens
+
1
11
FFT1024 1024 10 1
merge selectT F
FT
76
Static scheduling of DFStatic scheduling of DF
Key property of DF networks: output sequences do not depend on Key property of DF networks: output sequences do not depend on time time
ofof firingfiring of actors of actors
SDF networks can be SDF networks can be statically scheduledstatically scheduled at compile-time at compile-time execute an actor when it is execute an actor when it is knownknown to be fireable to be fireable no overhead due to sequencing of concurrencyno overhead due to sequencing of concurrency static buffer sizingstatic buffer sizing
Different schedules yield different Different schedules yield different code sizecode size buffer sizebuffer size pipeline utilizationpipeline utilization
77
Static scheduling of SDFStatic scheduling of SDF
Based only on Based only on process graphprocess graph (ignores functionality) (ignores functionality)
Network state: number of tokens in FIFOsNetwork state: number of tokens in FIFOs
Objective: find schedule that is Objective: find schedule that is validvalid, i.e.:, i.e.: admissibleadmissible
(only fires actors when fireable)(only fires actors when fireable)
periodic periodic
(brings network back to initial state firing each actor at least once)(brings network back to initial state firing each actor at least once)
Optimize cost function over admissible schedulesOptimize cost function over admissible schedules
78
Balance equationsBalance equations
Number of produced tokens must equal number of consumed tokens on every Number of produced tokens must equal number of consumed tokens on every edgeedge
Repetitions (or firing) vector vRepetitions (or firing) vector vS S of schedule S: number of firings of each actor of schedule S: number of firings of each actor
in Sin S
vvSS(A) n(A) npp = vvSS(B) n(B) ncc
must be satisfied for each edgemust be satisfied for each edge
np nc
A B
79
Balance equationsBalance equations
B C
A3
1
1
1
22
11
Balance for each edge:Balance for each edge: 3 v3 vSS(A) - v(A) - vSS(B) = 0(B) = 0
vvSS(B) - v(B) - vSS(C) = 0(C) = 0
2 v2 vSS(A) - v(A) - vSS(C) = 0(C) = 0
2 v2 vSS(A) - v(A) - vSS(C) = 0(C) = 0
80
Balance equationsBalance equations
M vM vSS = 0 = 0
iff S is periodiciff S is periodic
Full rank (as in this case) Full rank (as in this case) no non-zero solution no non-zero solution no periodic scheduleno periodic schedule
(too many tokens accumulate on A->B or B->C)(too many tokens accumulate on A->B or B->C)
3 -1 00 1 -12 0 -12 0 -1
M =
B C
A3
1
1
1
22
11
81
Balance equationsBalance equations
Non-full rankNon-full rank infinite solutions exist (linear space of dimension 1)infinite solutions exist (linear space of dimension 1)
Any multiple of q = |1 2 2|Any multiple of q = |1 2 2|TT satisfies the balance equations satisfies the balance equations
ABCBC and ABBCC are minimal valid schedulesABCBC and ABBCC are minimal valid schedules
ABABBCBCCC is non-minimal valid scheduleABABBCBCCC is non-minimal valid schedule
2 -1 00 1 -12 0 -12 0 -1
M =
B C
A2
1
1
1
22
11
82
Static SDF schedulingStatic SDF scheduling
Main SDF scheduling theorem (Lee ‘86):Main SDF scheduling theorem (Lee ‘86): A connected SDF graph with A connected SDF graph with nn actors has a periodic schedule iff its topology actors has a periodic schedule iff its topology
matrix M has rank matrix M has rank n-1n-1
If M has rank If M has rank n-1n-1 then there exists a unique smallest integer solution q to then there exists a unique smallest integer solution q to
M q = 0M q = 0
Rank must be at least Rank must be at least n-1n-1 because we need at least because we need at least n-1 n-1 edges edges
(connected-ness), providing each a linearly independent row(connected-ness), providing each a linearly independent row
Admissibility is not guaranteed, and depends on initial tokens on Admissibility is not guaranteed, and depends on initial tokens on
cyclescycles
83
Admissibility of schedulesAdmissibility of schedules
No admissible schedule:No admissible schedule:
BACBA, then deadlock…BACBA, then deadlock…
Adding one token (delay) on A->C makesAdding one token (delay) on A->C makes
BACBACBA validBACBACBA valid
Making a periodic schedule admissible is always possible, but changes specification...Making a periodic schedule admissible is always possible, but changes specification...
B C
A1
2
1
3
2
3
84
Admissibility of schedulesAdmissibility of schedules
Adding initial token changes FIR orderAdding initial token changes FIR order
* c1
+ o
i
* c2
i(-1)i(-2)
85
From repetition vector to scheduleFrom repetition vector to schedule
Repeatedly schedule fireable actors up to number of times in Repeatedly schedule fireable actors up to number of times in
repetition vectorrepetition vector
q = |1 2 2|q = |1 2 2|TT
Can find either ABCBC or ABBCC Can find either ABCBC or ABBCC
If deadlock before original state, no valid schedule exists (Lee ‘86)If deadlock before original state, no valid schedule exists (Lee ‘86)
B C
A2
1
1
1
22
11
86
From schedule to implementationFrom schedule to implementation Static scheduling used for:Static scheduling used for:
behavioral simulation of DF (extremely efficient)behavioral simulation of DF (extremely efficient)
code generation for DSP code generation for DSP
HW synthesis (Cathedral by IMEC, Lager by UCB, …)HW synthesis (Cathedral by IMEC, Lager by UCB, …)
Issues in code generationIssues in code generationexecution speed (pipelining, vectorization)execution speed (pipelining, vectorization)
code size minimizationcode size minimization
data memory size minimization (allocation to FIFOs)data memory size minimization (allocation to FIFOs)
processor or functional unit allocationprocessor or functional unit allocation
87
Compilation optimizationCompilation optimization
Assumption: Assumption: code stitchingcode stitching
(chaining custom code for each actor)(chaining custom code for each actor)
More efficient than C compiler for DSPMore efficient than C compiler for DSP
Comparable to hand-coding in some casesComparable to hand-coding in some cases
Explicit parallelism, no artificial control dependenciesExplicit parallelism, no artificial control dependencies
Main problem: memory and processor/FU allocation Main problem: memory and processor/FU allocation
depends on scheduling, and vice-versadepends on scheduling, and vice-versa
88
Code size minimizationCode size minimization
Assumptions (based on DSP architecture):Assumptions (based on DSP architecture):subroutine calls expensivesubroutine calls expensive
fixed iteration loops are cheap fixed iteration loops are cheap
(“zero-overhead loops”)(“zero-overhead loops”)
Absolute optimum: Absolute optimum: single appearance schedulesingle appearance schedule
e.g. ABCBC -> A (2BC), ABBCC -> A (2B) (2C)e.g. ABCBC -> A (2BC), ABBCC -> A (2B) (2C) may or may not exist for an SDF graph…may or may not exist for an SDF graph… buffer minimization relative to single appearance schedules buffer minimization relative to single appearance schedules
(Bhattacharyya ‘94, Lauwereins ‘96, Murthy ‘97)(Bhattacharyya ‘94, Lauwereins ‘96, Murthy ‘97)
89
Assumption: no buffer sharingAssumption: no buffer sharing
Example:Example:
q = | 100 100 10 1|q = | 100 100 10 1|TT
Valid SAS: (100 A) (100 B) (10 C) DValid SAS: (100 A) (100 B) (10 C) D requires 210 units of buffer arearequires 210 units of buffer area
Better (factored) SAS: (10 (10 A) (10 B) C) DBetter (factored) SAS: (10 (10 A) (10 B) C) D requires 30 units of buffer areas, but…requires 30 units of buffer areas, but… requires 21 loop initiations per period (instead of 3)requires 21 loop initiations per period (instead of 3)
Buffer size minimizationBuffer size minimization
C D1 10
A
B10
10
1
1
90
Dynamic scheduling of DFDynamic scheduling of DF
SDF is limited in modeling power SDF is limited in modeling power no run-time choiceno run-time choice
cannot implement Gaussian elimination with pivotingcannot implement Gaussian elimination with pivoting
More general DF is too powerfulMore general DF is too powerful non-Static DF is Turing-complete (Buck ‘93) non-Static DF is Turing-complete (Buck ‘93)
bounded-memory scheduling is not always possiblebounded-memory scheduling is not always possible
BDF: semi-static scheduling of special “patterns”BDF: semi-static scheduling of special “patterns” if-then-elseif-then-else repeat-until, do-whilerepeat-until, do-while
General case: thread-based dynamic scheduling General case: thread-based dynamic scheduling (Parks ‘96: may not terminate, but never fails if feasible)(Parks ‘96: may not terminate, but never fails if feasible)
91
Example of Boolean DFExample of Boolean DF
Compute absolute value of average of Compute absolute value of average of nn samples samples
+1+1 ++
--
>n>n
TT FF TT FF
TT FF
TT FF
TT
TT FF
<0<0
TT FF
0000
InIn
OutOut
92
Example of general DFExample of general DF
Merge streams of multiples of 2 and 3 in order (removing duplicates)Merge streams of multiples of 2 and 3 in order (removing duplicates)
Deterministic mergeDeterministic merge(no “peeking”)(no “peeking”)
ordered
merge
* 2 dup1
* 3 dup1
A B
O
out
a = get (A)
b = get (B)
forever {
if (a > b) {
put (O, a)
a = get (A)
} else if (a < b) {
put (O, b)
b = get (B)
} else {
put (O, a)
a = get (A)
b = get (B)
}
}
93
Summary of DF networksSummary of DF networks
Advantages:Advantages:Easy to use (graphical languages)Easy to use (graphical languages)
Powerful algorithms forPowerful algorithms for verification (fast behavioral simulation)verification (fast behavioral simulation) synthesis (scheduling and allocation)synthesis (scheduling and allocation)
Explicit concurrencyExplicit concurrency
Disadvantages:Disadvantages:Efficient synthesis only for restricted modelsEfficient synthesis only for restricted models
(no input or output choice)(no input or output choice)
Cannot describe reactive control (blocking read)Cannot describe reactive control (blocking read)
94
OutlineOutline
Part 3: Models of ComputationPart 3: Models of ComputationFSMsFSMs
Discrete Event Systems Discrete Event Systems
CFSMsCFSMs
Data Flow ModelsData Flow Models
Petri NetsPetri Nets
The Tagged Signal ModelThe Tagged Signal Model