© krithi ramamritham / kavi arya 1 embedded systems software prof. krithi ramamritham prof. kavi...
TRANSCRIPT
1© Krithi Ramamritham / Kavi Arya
Embedded Systems Software
Prof. Krithi RamamrithamProf. Kavi Arya
IIT Bombay CEP - DEP 2003
2© Krithi Ramamritham / Kavi Arya
Embedded Systems?
3© Krithi Ramamritham / Kavi Arya
Embedded Systems
• Single functional e.g. pager, mobile phone• Tightly constrained
– cost, size, performance, power, etc.• Reactive & real-time
– e.g. car’s cruise controller– delay in computation => failure of system
4© Krithi Ramamritham / Kavi Arya
Hardware is not the whole System !!!
A Micro-Electronic System is the result of a projection of …– Architecture– Hardware– Software
… distinguished by its gross Functional Behaviour !
• Software is an important part of the Product and must be part of the Design Process
… or we are only designing a Component of the system.
5© Krithi Ramamritham / Kavi Arya
Why Is Embedded Software Not Just
Software On Small Computers?• Embedded = Dedicated• Interaction with physical processes
– sensors, actuators, processes• Critical properties are not all functional
– real-time, fault recovery, power, security, robustness• Heterogeneity
– hardware/software tradeoffs, mixed architectures• Concurrency
– interaction with multiple processes• Reactivity
– operating at the speed of the environment
These features look more like hardware!These features look more like hardware!
Source:Source:Edward A. Lee, UC BerkeleyEdward A. Lee, UC BerkeleySRC/ETAB Summer Study 2001SRC/ETAB Summer Study 2001
Source:Source:Edward A. Lee, UC BerkeleyEdward A. Lee, UC BerkeleySRC/ETAB Summer Study 2001SRC/ETAB Summer Study 2001
6© Krithi Ramamritham / Kavi Arya
What is Embedded SW?
One definition:
“Software that is directly in contact with, or significantly affected by, the hardware that it executes on, or can directly influence the behavior of that hardware.”
7© Krithi Ramamritham / Kavi Arya
What is Embedded SW? • What is it not?
• Application software can be recompiled and executed on any number of hardware platforms so long as the basic services/libraries are provided.– It is divided by vertical market segments (application
domains)– Well-established methodologies, architectures,…– HW platform independent, highly portable
• Any SW that has no direct relationship with HW.
8© Krithi Ramamritham / Kavi Arya
Embedded System Challenges for HW Folks
• PARADIGM CHANGE!– Designers main tasks convert from processor integration to
performance analysis. Concentration on functional requirements instead of integration work
– Concentration on architectural exploration (including performance analysis Re-use and Platform-based design become key!
Early validation of system/solution correctness Parallel hardware and software development More effective use of previous work Faster ways to build new elements of a solution Ways to test more effectively, efficiently, quickly
9© Krithi Ramamritham / Kavi Arya
Software Guys can Learnfrom Hardware Experts!
• Concurrency– the synchrony abstraction– event-driven modeling
• Reusability– cell libraries– interface definition
• Reliability– leveraging limited abstractions– leveraging verification
• Heterogeneity– mixing synchronous and asynchronous designs– resource management
Source:Source:Edward A. Lee, UC BerkeleyEdward A. Lee, UC BerkeleySRC/ETAB Summer Study 2001SRC/ETAB Summer Study 2001
Source:Source:Edward A. Lee, UC BerkeleyEdward A. Lee, UC BerkeleySRC/ETAB Summer Study 2001SRC/ETAB Summer Study 2001
10© Krithi Ramamritham / Kavi Arya
Trade-offs. Methodology ESW Architectural specifics
• Portability– ESW itself is intended to provide portability for higher SW layers– (At least parts of) ESW is per definition not portable
• Real-time– Restricted use of standardized Inter-process communication (IPC)
mechanisms (CORBA,…) for performance reasons– Typically hard real-time requirements
• RTOS dependency– Implementation of OS like services– Sometimes shielding of the RTOS to higher level SW layers– Direct dependency on RTOS implementation
11© Krithi Ramamritham / Kavi Arya
Functional Design & Mapping
HW1 HW2 HW3 HW4Hardware Interface
RTOS/Drivers
Thr
eadArchitectural
Design
F1F2
F3
F4
F5Functional
Design
(F3) (F4)
(F5)
(F2)
Source:Source:Ian Phillips, ARMIan Phillips, ARM
VSIA 2001
Source:Source:Ian Phillips, ARMIan Phillips, ARM
VSIA 2001
12© Krithi Ramamritham / Kavi Arya
The Embedded Market: Disruptive Change
Traditional Embedded WorldNever small enoughNever fast enoughHeadless/Character-basedStandaloneBoot & Run from ROMMore Hardware than SoftwareLow-Level Programming ModelApplication tied to hardware
Today’s Embedded WorldNever functional enoughAlways connectedHigh Integration Chips (ASIC/SOC)Architectural diversityCOTS & custom hardwareEPROM/Flash/Rotating MediaSoftware IntensiveWeb interfacesOOP Programming ModelStandard applications
• Time to Market Pressures• Shortage of Embed. SW Engineers
Source: Jim Ready President / CEO MontaVista Software
Source: Jim Ready President / CEO MontaVista Software
© Krithi Ramamritham / Kavi Arya
Plan
• Embedded Systems
• New Approaches to building ESW– New paradigms: Lava, Handel-C– Examples + “Engineering Returns to Software”– Build a RISC processor in 48hrs– Advantages of reconfigurable hardware.
• Real-time support for ESW
14© Krithi Ramamritham / Kavi Arya
Motorola Software Survey Findings
• Hardware design is a software task: IC designers write code (VHDL, Verilog, Scripting)!
• We must become a software-intensive embedded system solutions company, focused on integrating our platforms into users’ products -in the future we’ll be neither a hardware nor a software company– Focus on developing systems capability, not just a software counterpart to our current
hardware capability (though that’s needed too)– We should have software content from drivers to applications
• The fundamental goal isn’t 70% margin on software products, it’s helping someone choose your total solution– Embedded systems platforms and solutions will be the key to market differentiation
and profitable growth
Source:Source:Bob Altizer, BASYSBob Altizer, BASYS
VSIA 2001
Source:Source:Bob Altizer, BASYSBob Altizer, BASYS
VSIA 2001
15© Krithi Ramamritham / Kavi Arya
Common Design Metrics
• NRE (Non-recurring engineering) cost• Unit cost• Size (bytes, gates)• Performance (execution time)• Power (more power=> more heat & less
battery time)• Flexibility (ability to change functionality)
16© Krithi Ramamritham / Kavi Arya
• Time to prototype• Time to market• Maintainability• Correctness• Safety (probability that system won’t
cause harm)
17© Krithi Ramamritham / Kavi Arya
Time to Market Design Metric
• Simplified revenue model– Product life = 2W, peak at W– Time of market entry defines a triangle,
representing market penetration– Triangle area equals revenue
• Loss – The difference between the on-time and
delayed triangle areas
• Avg. time to market today = 8 mth• 1 day delay may amount to $Ms
– see Sony Playstation vs XBox
On-time Delayedentry entry
Peak revenue
Peak revenue from delayed entry
Market rise Market fall
W 2W
Time
D
On-time
Delayed
Rev
enue
s ($
)
Source: Embedded System Design: Frank Vahid/ Tony Vargis (John Wiley & Sons, Inc.2002)
18© Krithi Ramamritham / Kavi Arya
NRE and unit cost metrics
• But, must also consider time-to-market
$0
$40,000
$80,000
$120,000
$160,000
$200,000
0 800 1600 2400
A
B
C
$0
$40
$80
$120
$160
$200
0 800 1600 2400
Number of units (volume)
A
B
C
Number of units (volume)
tota
l co
st (
x100
0)
pe
r p
rod
uc
t c
ost
• Compare technologies by costs -- best depends on quantity– Technology A: NRE=$2,000, unit=$100– Technology B: NRE=$30,000, unit=$30– Technology C: NRE=$100,000, unit=$2
Source: Embedded System Design: Frank Vahid/ Tony Vargis (John Wiley & Sons, Inc.2002)
19© Krithi Ramamritham / Kavi Arya
Losses due to delayed market entry
• Area = 1/2 * base * height– On-time = 1/2 * 2W * W– Delayed = 1/2 * (W-D+W)*(W-D)
• Percentage revenue loss = (D(3W-D)/2W2)*100%
• Try some examples
On-time Delayedentry entry
Peak revenue
Peak revenue from delayed entry
Market rise Market fall
W 2W
Time
D
On-time
Delayed
Rev
enue
s ($
)
– Lifetime 2W=52 wks, delay D=4 wks
– (4*(3*26 –4)/2*26^2) = 22%– Lifetime 2W=52 wks, delay D=10
wks– (10*(3*26 –10)/2*26^2) = 50%– Delays are costly!
Source: Embedded System Design: Frank Vahid/ Tony Vargis (John Wiley & Sons, Inc.2002)
20© Krithi Ramamritham / Kavi Arya
Trends• Moore’s Law
– IC transistor capacity doubles every 18 mths– 1981: leading edge chip had 10k transistors– 2002: leading edge chip has 150M transistors
• Designer productivity has improved due to better tools:– Compilation/Synthesis tools– Libraries/IP– Test/verification tools– Standards – Languages and frameworks (Handel-C, Lava, Esterel, …)– 1981: designer produced 100 transistors per month– 2002 designer produces 5000 transistors per month
21© Krithi Ramamritham / Kavi Arya
Our New Understanding• We have simultaneous optimisations of competing design
metrics: speed, size, power, complexity, etc.
• We need a “Renaissance Engineer”– with holistic view of design process and comfortable with technologies
ranging from hardware, software to formal methods
• Maturation of behavioral synthesis tools and other tools has enabled this kind of unified view of hardware/ software co-design.
• Design efforts now focus at higher levels of abstraction => abstract specifications now refined into programs and then into gates and logic.
• There is no fundamental difference of between what hardware and software can implement.
22© Krithi Ramamritham / Kavi Arya
Designer Productivity• “The Mythical Man Month” by Frederick Brooks ’75
• More designers on team => lower productivity because of increasing communication costs between groups
• Consider 1M transistor project:- Say, a designer has productivity of 5000 transistor/mth- Each extra designer => decrease of 100 transistor/mth
productivity in group due to comm. costs
– 1 designer 1M/5000 = 200mth– 10 designer 1M/(10*4100) = 24.3mth– 25 designer 1M/(25*2600) = 15.3mth– 27 designer 1M/(27*2400) = 15.4mth
• Need new design technology to shrink the design gap
Source: Embedded System Design: Frank Vahid/ Tony Vargis (John Wiley & Sons, Inc.2002)
23© Krithi Ramamritham / Kavi Arya
Design Productivity Gap• Designer productivity has grown over the last decade• Rate of improvement has not kept pace with the chip-
capacity growth• 1981: leading edge chip:
– 100 designers * 100 trans/mth => 10k trans complexity
• 2002: leading edge chip:– 30k designer mth * 5k trans/mth => 150M trans complexity
• Designers at avg. of $10k pm=> cost of building leading edge chips has gone from $1M in 1981 to $300M in 2002
• Need paradigm shift to cope with the complexities of system design
© Krithi Ramamritham / Kavi Arya
Plan
• Embedded Systems
• New Approaches to building ESW– New paradigms: Lava, Handel-C– Examples + “Engineering Returns to Software”– Build a RISC processor in 48hrs– Advantages of reconfigurable hardware.
• Real-time support for ESW
25© Krithi Ramamritham / Kavi Arya
Lava
• Not so much a hardware description language
• More a style of circuit description
• Emphasises connection patterns
• Think of Lego
26© Krithi Ramamritham / Kavi Arya
Lava
• Mary Sheeran, Koen Classen, & Satnam SinghChalmers University (Sweden)
• Based on earlier work on MuFP to describe circuit functionality and layout in single language
• Built using functional programming paradigm
27© Krithi Ramamritham / Kavi Arya
Behaviour and Structure
f g
gf
f ->- g
28© Krithi Ramamritham / Kavi Arya
Lava Properties• Higher-order functions
– Circuits are functions– May be passed as arguments to other functions. – => Easier to produce parameterized circuits than with VHDL.
• Functions can return circuits as results– Circuit combinators take circuits as arguments, return circuits as results. – => Powerful glue for composing circuits to form larger systems.
• Circuit combinators combine behavior + layout– Combinators lay out circuits in rows, columns, triangles, trees etc.
• Performance of circuit – Improved by exploring the layout design space by experimenting with alternative
layout combinators. • Examples of circuits produced:
– High speed constant coefficient multipliers, finite impulse response filters (1D and 2D), adder tree networks and sorting butterfly networks.
29© Krithi Ramamritham / Kavi Arya
Parallel Connection Patterns
f -|- g
g
f
30© Krithi Ramamritham / Kavi Arya
map f
f
f
f
f
31© Krithi Ramamritham / Kavi Arya
Four Sided Tiles
32© Krithi Ramamritham / Kavi Arya
Column
33© Krithi Ramamritham / Kavi Arya
Full Adder
fa
fa (cin, (a,b)) = (sum, cout) where part_sum = xor (a, b) sum = xorcy (part_sum, cin) cout = muxcy (part_sum, (a, cin))
a
b
cin
cout
sum
34© Krithi Ramamritham / Kavi Arya
Generic Adder
fa
fa
fa adder = col fa
35© Krithi Ramamritham / Kavi Arya
Top Level
adder16Circuit = do a <- inputVec ”a” (bit_vector 15 downto 0) b <- inputVec ”b” (bit_vector 15 downto 0) (s, carry) <- adder4 (a, b) sum <- outputVec ”sum” s (bit_vector 16 downto 0)
? circuit2VHDL ”add16” adder16Circuit? circuit2EDIF ”add16” adder16Circuit? circuit2Verilog ”add16” adder16Circuit
36© Krithi Ramamritham / Kavi Arya
Xilinx FPGA Implementation
• 16-bit implementation on a XCV300 FPGA• Vertical layout required to exploit fast carry chain• No need to specify coordinates in HDL code
37© Krithi Ramamritham / Kavi Arya
16-bit Adder Layout
Source: Mary Sheeran Nov.2002
38© Krithi Ramamritham / Kavi Arya
Four adder trees
Source: Mary Sheeran Nov.2002
39© Krithi Ramamritham / Kavi Arya
No Layout Information
Source: Mary Sheeran Nov.2002
© Krithi Ramamritham / Kavi Arya
Plan
• Embedded Systems
• New Approaches to building ESW– New paradigms: Lava, Handel-C– Examples + “Engineering Returns to Software”– Build a RISC processor in 48hrs– Advantages of reconfigurable hardware.
• Real-time support for ESW
© Krithi Ramamritham / Kavi Arya
Handel-C
• Programming language- enables compilation of programs into synchronous hardware
• NOT Hardware Description Language- it’s a prog. language aimed at compiling high-level algorithms into gate-level hardware
• Syntax (loosely) based on “C”
• Handel-C is to hardware (gates) what “C” is to micro-assembly code
© Krithi Ramamritham / Kavi Arya
Handel-C (cont.)
• Inventor - Ian Page, Programming Research Group (Oxford University/UK)
• Semantics based on Hoare’s Communication Seq. Processes (CSP) model &
• Occam: transputer prog. language
• Industry heavyweights using tools: Marconi, Ericcson, BAe, Creative Labs, etc.
© Krithi Ramamritham / Kavi Arya
What this means
• Hardware design produced is exactly the hardware specified in source program
• No intermediate “interpreting” layer as in assembly language targeting general purpose microprocessor
• Logic gates are assembly instructions of Handel-C system
• Design/re-design/optimise at software level!!!
© Krithi Ramamritham / Kavi Arya
What This Means
• True parallelism– not time-shared (interpreted) parallelism of gen.purpose
computers
• PAR {a;b}– instructions executed in // at same instant of time by 2 sep.
pcs of hw
• Timing– branches that complete early forced to wait for slowest
branch before continuing
© Krithi Ramamritham / Kavi Arya
Comparison with “C”
• Similar:- Programs inherently sequential- Similar control-flow constructs: if-then-else, switch, while, for, etc.
• Dissimilar :- No malloc/ dynamic store allocation- No recursion (limited rec. in macros)- No nested procedures- No stdin/stdout - “Void main()”- variable width words- PAR, etc.
© Krithi Ramamritham / Kavi Arya
Handel-C is based on
• ANSI-standard C without external library-functions:
– I/O functions: printf(), putc(), scanf(),...– File functions: fopen(), fclose(), fprintf(), ...– String-functions: length(), strcpy(), strcmp(),…– Math-functions: sin(), cos(), sqrt(),…– ...
© Krithi Ramamritham / Kavi Arya
Supported declarationsstatements & instructions:
• Main program structure
• Variables• Arrays• Switch statement• FOR Loop• Comments• Constants• Scope & Variable sharing• Arithmetic, Relational, Relational Logic ops• Conditional Execution• While loop• Do … While Loop
© Krithi Ramamritham / Kavi Arya
Channel Communication
• link!v … link?v– channel input is form of assignment
• Provides link between parallel (‘//’) branches– One // branch outputs data onto channel– Other // branch reads data from channel
• => Synchronisation– data transfers only when both processes are ready
© Krithi Ramamritham / Kavi Arya
Additional Features & Statements
• Channelunsigned int 8 a;
chan unsigned int 8 c;
c ! 5;
c ? A;
© Krithi Ramamritham / Kavi Arya
Additional Features & Statements
• Prialt
prialt
{
case CommsStatement:
Statement
break;
...
default:
Statement
break;
}
A!1 C ?x
B!2 D?y
A?u
B?v D!9
C !8
© Krithi Ramamritham / Kavi Arya
Example 1 (sum)Void main()
{ unsigned int 16 sum; // variable width wordunsigned int 8 data;chanin input; // input/outputchanout output;
sum=0;do{ input?data;
sum = sum + (0@data);} while (data!=0);output!sum;
}
IMPORTANT – width!!
© Krithi Ramamritham / Kavi Arya
Example 2 (divider) #define DATA_WIDTH 16Void main(void)
{ unsigned int DATA_WIDTH a, mult, result;unsigned int (DATA_WIDTH*2 -1) b;chanin input;chanout output;while (1){ input?a;input?result;b = result @ 0;mult = 1<< (DATA_WIDTH-1)result = 0;<<<<< MAIN LOOP >>>>>output ! Result;}
}
result = integer(a / b)
© Krithi Ramamritham / Kavi Arya
Example 2 (cont.)
while (mult != 0){
if (0 @ a) >= b)par { a -= b <- width(a);
result != mult;}
par{ b = b >> 1;
mult = mult >> 1;}
}
© Krithi Ramamritham / Kavi Arya
Example 3 Void main(void){
chan unsigned int undefined link[2];chanin unsigned int 8 input;chanout unsigned int 8 outputunsigned int undefined state[3];par{ while (1) // first queue location{ input ? State[0];link[0] ! State[0];}while (1) // second queue location { link[0] ? State[1];link[1] ! State[1];} while (1) // third queue location{ link[1] ? State[2];output ! State[2];}}
}
State[0] State[1] State[2]
Parallel tasksComm between tasksArray of variablesArray of channelsParameterised on width
input outputLink[0] Link[1]
© Krithi Ramamritham / Kavi Arya
Additional Features & Statements
• Timing
An assignment statement takes exactly one clock cycle to execute. Everything else is free
void main(void){
unsigned 8 x, y;…x = x + y;
}
© Krithi Ramamritham / Kavi Arya
Timing/efficiency issues • One clock source for entire program
- Assignment & delay take one clock cycle- Expressions are “for free”
• Handel-C designed such that experienced programmer can immediately tell which instructions execute on which clock cycles
• Examplex = y;x = (((y*z) + (w*v) )<<2)<-7;
both statements take one clock cycle
• Clock at longest logic depth=> reduce the depth of logic to speed up program=> pipelining
© Krithi Ramamritham / Kavi Arya
Porting “C” to Handel-C • Decide how software maps to hardware platform• Partition algorithm between multiple FPGAs• Port C to Handel-C & use simulator to check correctness• Modify code to take advantage of extra operators in Handel-C
- simulate to ensure correctness• Add fine-grain parallelism through PAR & parallel assignments
or parallellise algorithm - simulate• Add hardware interfaces for target architecture & map
simulator channels communications onto these interfaces - simulate
• Use FPGA place & route tools to generate FPGA images
© Krithi Ramamritham / Kavi Arya
Design Flow Overview
Port algorithm to Handel-C
Compile program to .net file
for simulator
Use simulator to evaluateand debug design
Add interfaces to external hardware
Use Handel-C compiler to target h/w netlist
Use FPGA tools toplace & route netlist
Program FPGA withresult of place & route
Modify/debug program
© Krithi Ramamritham / Kavi Arya
Essence • Software approach allows us to rapidly prototype applications
for a given domain
• Handel-C provides a seamless approach toderive expressive and fast implementations from the software level
• Cost of silicon is falling & shortage of trained engineers& high cost of programmer time
=> Software based, high-level approaches to solving problems become increasingly attractive.
60© Krithi Ramamritham / Kavi Arya
Handel-C Concepts (Recap)
• Describes hardware - h/w design produced = h/w in source program
• Logic gates are assembly instructions of Handel-C system
• Real parallelism – not interpreted
• Assignment, delay take 1 clock cycle;Expression evaluation is free
• No side-effectsI.e. a++ is statement (not expression as in ‘C’)
• Variable width words => great performance improvement over softwareMin. datapath widths => minimal h/w usage
61© Krithi Ramamritham / Kavi Arya
Additional Features & Statements
• Concurrency...par{
{}…{ …}
}
62© Krithi Ramamritham / Kavi Arya
Concurrency (example)
void main(void){
unsigned 8 x, y;unsigned 5 temp1;unsigned 4 temp2;...temp1 = (0@(x <- 4)) + (0@(y <- 4));temp2 = (x \\ 4) + (y \\ 4);x = (temp2 + (0@temp1[4])) @ temp1[3:0];
}
63© Krithi Ramamritham / Kavi Arya
Additional Features & Statements
• Concurrency
...
par
{
temp1=(0@(x<-4))+(0@(y<-4));
temp2=(x\\4)+(y\\4);
}
x=(temp2+(0@temp1[4]))@temp1[3:0];
...
64© Krithi Ramamritham / Kavi Arya
Features & Statements (contd.)
• Delay...par{
x = 1;{ delay; x=2;}
}
while (x == 0) delay;
65© Krithi Ramamritham / Kavi Arya
Additional Features & Statements
• Channelunsigned int 8 a;chan unsigned int 8 c;
c ! 5;c ? A;
Single variable must not be accessed by >1 // branch=>par{ out!3;
out!4} // illegal
Statem ent
ChannelC !5 C ?a
66© Krithi Ramamritham / Kavi Arya
Features & Statements(contd.)
• Macros(Examples - contd)
– Combinatorialmacro expr abs(a) = ((a) [width(a)-1] == 0 ? (a) : (-a));
shared expr incwrap(e, m) = (((e==m) ? 0 : (e)+1);
– Recursivemacro expr copy (e, n) = select(n==1, (e), copy(e, n/2) @ copy(e, n-(n/2)))
67© Krithi Ramamritham / Kavi Arya
Features & Statements(contd)
• Operators for Bit Manipulation
z = x <- 2; // Take least significant bitsz = y \\ 2; // Drop least significant bitsz = x @ y; // Concatenationz = x[3]; // Bit selectionz = y[2:3]; // Bus selectionz = width(x); // Width of expression
Note: in the form y[m:n] the order is MSB:LSB
Unsigned int 3 y = 4;y[0] is 0;y[2] is 1;
68© Krithi Ramamritham / Kavi Arya
Additional Features & Statements
• External RAM / ROM
ram unsigned int 4 ExtRAM[8] with {offchip = 1,
data = {"P01", "P02", "P03", "P04"},
addr = {"P05", "P06", "P07"},
we = {"P08"}, oe = {"P09"}, cs = {"P10"} };
rom unsigned int 4 ExtROM[8] with {offchip = 1,
data = {"P01", "P02", "P03", "P04"},
addr = {"P05", "P06", "P07"},
we = {}, oe = {"P09"}, cs = {"P10"} };
69© Krithi Ramamritham / Kavi Arya
Additional Features & Statements
• Internal RAM / ROM
ram unsigned int 8 speicher[256];
rom unsigned int 8 program[] = {1,2,3,4};
unsigned char i;
i = 3;
speicher[i] = 25;
for (i = 0; i < 4; i++) stdout ! program[i];
70© Krithi Ramamritham / Kavi Arya
Recursive Macro Expressions – Example
• Illustrates the generation of large quantities of hardware from simple macros.
• Multiplier whose width depends on the parameters of the macro.
• Starting point for generating large regular hardware structures using macros.
• Single-cycle long multiplication from single macro:
macro expr multiply(x, y) =select(width(x) == 0,
0, multiply(x \\ 1, y << 1) + (x[0] == 1 ? y : 0));
a = multiply (b , c);
71© Krithi Ramamritham / Kavi Arya
Timing
72© Krithi Ramamritham / Kavi Arya
Additional Features & Statements• Off-Chip Interface
– Input, registered Input, latched Input– Output– Tristate Bus
• Off-Chip Interface (examples)
interface bus_in (int 4) InBus() with{data = {"P1", "P2", "P3", "P4"} };
int 4 x;x = InBus.in;
interface bus_out () OutBus (x+y) with{data = {"P11", "P12", "P13", "P14"} };
73© Krithi Ramamritham / Kavi Arya
Parallel Access to Variables• Rules of parallelism:
same variable must not be accessed from two separate parallel branches. (to avoid resource conflicts on the variables)
• Actually, the same variable must not be assigned to more than once on the same clock cycle but may be read as often as required (see wires!)
• Allows some useful and powerful programming techniques. eg:par{
a = b;b = a;
} // swaps values of a and b in single clock cycle.
74© Krithi Ramamritham / Kavi Arya
Parallel Access to Variables• Four place queue:
while(1){
par{ int x[3];x[0] = in;x[1] = x[0];x[2] = x[1]; // values at “out” delayed out = x[2]; // by 4 clock cycles }
}
75© Krithi Ramamritham / Kavi Arya
Time Efficiency of Handel-C Hardware
• Requirement:Clock period for program to be longer than longest path thru combinatorial logic in whole program.
• => once FPGA place and route is done, max. clock-rate = 1/longest-path-delay
• Example:FPGA place and route tools calculate longest path delay between flip-flops in a design is 70nS.
• The max. clock rate is 1/70nS = 14.3MHz.Speed allowed by system: 400kHz - 100MHz
• BUT WHAT IF THIS IS NOT FAST ENOUGH
76© Krithi Ramamritham / Kavi Arya
Improving Time Efficiency
• Reducing Logic DepthAvoid multiplication, avoid wide-adders, reduce complex expressions into stages, etc. unsigned 8 x;
unsigned 8 y;
unsigned 5 temp1;
unsigned 4 temp2;
par
{
temp1 = (0@(x<-4)) + (0@(y<-4));
temp2 = (x \\ 4) + (y \\ 4);
}
x = (temp2+(0@temp1[4])) @ temp1[3:0];Pipelining => increased latency for higher throughput
© Krithi Ramamritham / Kavi Arya
Plan
• Embedded Systems
• New Approaches to building ESW– New paradigms: Lava, Handel-C– Examples (“Engineering Returns to Software”– Build a RISC processor in 48hrs– Advantages of reconfigurable hardware.
• Real-time support for ESW
80© Krithi Ramamritham / Kavi Arya
RISC-Processor • Features:
– 16 instructions– 4 bit I/O Ports– one accumulator– Program memory (16x8 ROM)– Data memory (16x4 RAM)
• Problem:Execute a program stored in ROM to calculate the first few members of the Fibonacci number sequence.
1, 2, 3, 5, 8, 13, 21, 34, …
fib(n) = 1 if n=0 V n=1fib(n) = fib(n-1) + fib(n-2) if n>=2
81© Krithi Ramamritham / Kavi Arya
RISC-Processor
• Instruction Set
82© Krithi Ramamritham / Kavi Arya
RISC-Processor (cont.) • Program:
chanin input;chanout output;
// Parameterisation#define dw 32 /* Data width */#define opcw 4 /* Op-code width */#define oprw 4 /* Operand width */#define rom_aw 4 /* Width of ROM address bus */#define ram_aw 4 /* Width of RAM address bus */
// The opcodes#define HALT 0#define LOAD 1#define LOADI 2#define STORE 3#define ADD 4#define SUB 5#define JUMP 6#define JUMPNZ 7#define INPUT 8#define OUTPUT 9
// The assembler macro#define _asm_(opc, opr) (opc + (opr << opcw))
85© Krithi Ramamritham / Kavi Arya
RISC-Processor (cont.) • Program (cont):
// Rom program datarom unsigned int undefined program[] ={_asm_(LOADI, 1), /* 0 */ /* Get a one */_asm_(STORE, 3), /* 1 */ /* Store this */_asm_(STORE, 1), /* 2 */_asm_(INPUT, 0), /* 3 */ /* Read value from user */_asm_(STORE, 2), /* 4 */ /* Store this */_asm_(LOAD, 1), /* 5 */ /* Loop entry point */_asm_(ADD, 0), /* 6 */ /* Make a fib number */_asm_(STORE, 0), /* 7 */ /* Store it */_asm_(OUTPUT, 0), /* 8 */ /* Output it */_asm_(ADD, 1), /* 9 */ /* Make a fib number */_asm_(STORE, 1), /* a */ /* Store it */_asm_(OUTPUT, 0), /* b */ /* Output it */_asm_(LOAD, 2), /* c */ /* Decrement counter */_asm_(SUB, 3), /* d */_asm_(JUMPNZ, 4), /* e */ /* Repeat if not zero */_asm_(HALT, 0) /* f */};
86© Krithi Ramamritham / Kavi Arya
RISC-Processor (cont.) • Program (cont):
/* RAM for processor */ram unsigned int dw data[1 << ram_aw];
/* Processor registers */unsigned int rom_aw pc; /* Program counter */unsigned int (opcw+oprw) ir; /* Instruction register */unsigned int dw x; /* Accumulator */
/* Macros to extract opcode and operand fields */#define opcode (ir <- opcw)#define operand (ir \\ opcw)
87© Krithi Ramamritham / Kavi Arya
RISC-Processor (cont.) • Program (cont):
/* Main program */void main(void){
pc = 0;// Processor loopdo{
// fetchpar{
ir = program[pc];pc = pc + 1;
}/* === MAIN DECODE/EXECUTE ===*/
} while (opcode != HALT);} /* main program */
88© Krithi Ramamritham / Kavi Arya
RISC-Processor (cont.) • Program (cont):
// decode and executeswitch (opcode){
case LOAD : x = data[operand<-ram_aw]; break;case LOADI : x = 0 @ operand; break;case STORE : data[operand<-ram_aw] = x; break;case ADD : x = x+data[operand<-ram_aw]; break;case SUB : x = x-data[operand<-ram_aw]; break;case JUMP : pc = operand<-rom_aw; break;case JUMPNZ : if (x!=0) pc=operand<-rom_aw; break;case INPUT : input ? x; break;case OUTPUT : output ! x; break;default : while(1) delay; // unknown opcode
}
89© Krithi Ramamritham / Kavi Arya
RISC-Processor (cont.) • The Final Program!
90© Krithi Ramamritham / Kavi Arya
Simulation & debugging
• The simulator is integrated into the compiler.• Executing a cycle-based simulation.• Variables are traceable at any clock cycle.• Port interface will be replaced by standard I/O.• Handel-C simulator supports debugging at any
clock-cycle.• Highlighting of characteristic Values e.g. Area of
any program line.
91© Krithi Ramamritham / Kavi Arya
Some Recent Work • “Customising Graphics Applications:
Techniques & Programming Interface”Henry Styles & Wayne Luk, Proceedings of IEEE Symposium on Field Programmable Custom Computing Machines, IEEE Computer Society Press, 2000.
• Exploit custom data-formats and datapath widthsto optimise graphics operations such as texture mapping & hidden-surface removal.
• Discusses techniques for balancing graphics pipeline
• Customised architectures captured in Handel-Ccompiled for Xilinx Virtex FPGAs
• Handel-C API based on OpenGL standardfor automatic speedup of graphics applications, include Quake-2 action game.
92© Krithi Ramamritham / Kavi Arya
The Graphics Pipeline
93© Krithi Ramamritham / Kavi Arya
Performance Case Studies • Geometric Visualisation
Implementation Medium Clock rate (MHz) Frame rate (FPS) Cost
Software on PC 400 24 $1,000
Xilinx XCV1000 40 41 $4,000
Nvidia TNT2 Ultra 170 55 $200
Nvidia is a 3-D graphics chipset – I.e. specialised graphics ASICChart => FPGA platform fast approaching performance of
dedicated graphics ASICfor gen. Purpose graphics applications
94© Krithi Ramamritham / Kavi Arya
Performance Case Studies • Infrared Simulation
requires custom pixel format not supported by graphics ASICs
Implementation Medium Clock rate (MHz) Frame rate (FPS) Cost
Software on PC 400 96 $1,000
Xilinx XCV1000 40 330 $4,000
SGI Onyx2 Reality 180 2750 $180,000
Onyx contains two 180 MHz MIPs processors, two Geometry Engine processors and two rasteriser ASICs, with a memory
Bandwidth of 6.4 GB/sec (I.e. 10X cost & mem.b/w of FPGA
95© Krithi Ramamritham / Kavi Arya
Performance Case Studies
• Quake-2 benchmark requires custom pixel format not supported by graphics ASICsDemonstration Software(fps) XCV1000(fps) ASIC(fps)
Demo.dm1 0.2 14.4 71.6
Jail5059.dm2 0.2 15.0 72.6
jail3A020.dm2 0.3 15.6 71.5
Bottleneck is PCI-bus speed limitation.Improve performance by moving FPGA to AGP slot allowing 1GB/sec transfers
between graphics h/w and memory
96© Krithi Ramamritham / Kavi Arya
Some Observations • FPGA renderer is a low-cost platform for custom graphics
applications
• Development time of a customised FPGA renderer comparable to optimised software=> effective to use a reconfigurable platform
• Good for reconfigurable designs where ASIC is not available or too expensive
• Useful in exploring desirable algorithms and architectures for ASICs
• Hardware renderer may be customised to maximixe performance for each application
97© Krithi Ramamritham / Kavi Arya
Some Features of the Rapid Prototyping Board
• Full length 32 bit PCI card
• Virtex XCV1000: 1.000.000 system gates,
• 131 kBit Block RAM, 393 kBit SelectRAM
• Programmable clock 400 kHz to 100 MHz
• 4 banks of fast asynchronous 32 bit wide SRAM, each 2 Mbytes
• PCI interface: 32 bit, 33 MHz, 132 Mbytes/sec burst
• 2 x PMC sites for VME grade I/O & processing modules
• 50 pin Aux I/O, 8 LEDs
98© Krithi Ramamritham / Kavi Arya
Summary • Cost of silicon is falling
& Products are getting more complex& Time-to-market shrinking rapidly & shortage of trained engineers& cost of programmer time is major constraint
=>Software based, high-level approaches to solving problems become increasingly attractive.
• New generation of languages let us build systems at high level of abstraction.
• High-density FPGAs and SoCs allow complex designs to be rapidly prototyped => reduce the development cycle of new technology – perhaps even to deploy final product as “soft cores”.
• Broader understanding demanded from system designer – need “Renaissance Engineer” with equal understanding of hardware and software.
99© Krithi Ramamritham / Kavi Arya
Plan• Embedded Systems
• New Approaches to building ESW
• Real-Time Support
– Special Characteristics of Real-Time Systems
– Real-Time Constraints
– Canonical Real-Time Applications
– Scheduling in Real-time systems
– Operating System Approaches
100© Krithi Ramamritham / Kavi Arya
computer world real worlde.g., PC industrial system, airplane
average response for user, events occur in environment at own speedinteractive
occasionally longer reaction too slow: deadline miss
reaction: user annoyed reaction: damage, pot. loss of human life
computer controls speed of user computer must follow speed of environment
“computer time” “real-time”
What is “real” about real-time?
101© Krithi Ramamritham / Kavi Arya
A real-time system is a system that reacts to events in the environment by performing predefined actions
I/O - data
I/O - data
Real-Time Systems
Real-timecomputing system
event
action
within specified time intervals.
time
102© Krithi Ramamritham / Kavi Arya
CLIENT SERVER
Flight Avionics
Constraints on responses to pilot inputs, aircraft state updates
103© Krithi Ramamritham / Kavi Arya
Constraints:–Keep plastic at proper temperature (liquid, but not boiling)–Control injector solenoid (make sure that the motion of the piston reaches the end of its travel)
104© Krithi Ramamritham / Kavi Arya
Real-Time Systems: Properties of Interest
• Safety: Nothing bad will happen.
• Liveness: Something good will happen.
• Timeliness: Things will happen on time -- by their deadlines, periodically, ....
106© Krithi Ramamritham / Kavi Arya
Performance Metrics in Real-Time Systems
• Beyond minimizing response times and increasing the throughput:
– achieve timeliness.
• More precisely, how well can we predict that deadlines will be met?
107© Krithi Ramamritham / Kavi Arya
Types of RT Systems
Dimensions along which real-time activities can be categorized:• how tight are the deadlines? --deadlines are tight when the laxity
(deadline -- computation time) is small.• how strict are the deadlines? what is the value of executing an
activity after its deadline?• what are the characteristics of the environment? how static or
dynamic must the system be?
Designers want their real-time system to be fast, predictable, reliable, flexible.
108© Krithi Ramamritham / Kavi Arya
deadline (dl)
+
Hard, soft, firm• Hard
result useless or dangerousif deadline exceeded
value
time
-
hardsoft
• Softresult of some - lower -value if deadline exceeded
Deadline intervals:result required not laterand not before
• Firm
If value drops to zero at deadline
109© Krithi Ramamritham / Kavi Arya
Examples
• Hard real time systems– Aircraft– Airport landing services– Nuclear Power Stations– Chemical Plants– Life support systems
• Soft real time systems– Mutlimedia– Interactive video games
110© Krithi Ramamritham / Kavi Arya
Real-Time: Items and Terms
Task– program, perform service, functionality– requires resources, e.g., execution time
Deadline– specified time for completion of, e.g., task– time interval or absolute point in time– value of result may depend on completion time
111© Krithi Ramamritham / Kavi Arya
Plan
• Special Characteristics of Real-Time Systems
• Real-Time Constraints
• Canonical Real-Time Applications
• Scheduling in Real-time systems
• Operating System Approaches
112© Krithi Ramamritham / Kavi Arya
Timing ConstraintsReal-time means to be in time ---
how do we know something is “in time”?how do we express that?
• Timing constraints are used to specify temporal correctnesse.g., “finish assignment by 2pm”, “be at station before train departs”.
• A system is said to be (temporally) feasible, if it meets all specified timing constraints.
• Timing constraints do not come out of thin air:design process identifies events, derives, models, and finally specifies timing constraints
113© Krithi Ramamritham / Kavi Arya
• Periodic– activity occurs repeatedly– e.g., to monitor environment values, temperature, etc.
time
period
periodic
114© Krithi Ramamritham / Kavi Arya
• Aperiodic– can occur any time– no arrival pattern given
time
aperiodicaperiodic
115© Krithi Ramamritham / Kavi Arya
• Sporadic– can occur any time, but– minimum time between arrivals
time
mint
sporadic
116© Krithi Ramamritham / Kavi Arya
Who initiates (triggers) actions?
Example: Chemical process – controlled so that temperature stays below danger level– warning is triggered before danger point …… so that cooling can still occur
Two possibilities:– action whenever temp raises above warn;
event triggered– look every int time intervals; action when temp if measures above warn
time triggered
117© Krithi Ramamritham / Kavi Arya
TT
ET
time
t
118© Krithi Ramamritham / Kavi Arya
TT
ET
time
t
119© Krithi Ramamritham / Kavi Arya
ET vs TT
• Time triggered– Stable number of invocations
• Event triggered– Only invoked when needed– High number of invocation and computation demands if value
changes frequently
121© Krithi Ramamritham / Kavi Arya
Other Issues to worry about• Meet requirements -- some activities may run only:
– after others have completed - precedence constraints– while others are not running - mutual exclusion– within certain times - temporal constraints
• Scheduling– planning of activities, such that required timing is kept
• Allocation– where should a task execute?
122© Krithi Ramamritham / Kavi Arya
Plan
• Special Characteristics of Real-Time Systems
• Real-Time Constraints
• Canonical Real-Time Applications
• Scheduling in Real-time systems
• Operating System Approaches
123© Krithi Ramamritham / Kavi Arya
A Typical Real time system
Temperature sensor
CPU
Memory
Input port
Output portHeater
124© Krithi Ramamritham / Kavi Arya
Code for example
While true do
{
read temperature sensor
if temperature too high
then turn off heater
else if temperature too low
then turn on heater
else nothing
}
125© Krithi Ramamritham / Kavi Arya
Comment on code
• Code is by Polling device (temperature sensor)• Code is in form of infinite loop• No other tasks can be executed• Suitable for dedicated system or sub-system only
126© Krithi Ramamritham / Kavi Arya
Extended polling example
Computer
Temperature Sensor 1
Temperature Sensor 2
Temperature Sensor 3
Temperature Sensor 4
Heater 1
Heater 2
Heater 3
Heater 4
Task 1
Task 2
Task 3
Task 4
Conceptual link
127© Krithi Ramamritham / Kavi Arya
Polling
• Problems– Arranging task priorities– Round robin is usual within a priority level– Urgent tasks are delayed
128© Krithi Ramamritham / Kavi Arya
Interrupt driven systems
• Advantages– Fast– Little delay for high priority tasks
• Disadvantages– Programming– Code difficult to debug– Code difficult to maintain
129© Krithi Ramamritham / Kavi Arya
How can we monitor a sensor every 100 ms
Initiate a task T1 to handle the sensor
T1:
Loop
{Do sensor task T2
Schedule T2 for +100 ms
}
Note that the time could be relative (as here) or could be an actual time - there would be slight differences between the methods, due to the additional time to execute the code.
130© Krithi Ramamritham / Kavi Arya
An alternative…
Initiate a task to handle the sensor T1
T1:
Do sensor task T2
Repeat
{Schedule T2 for n * 100 ms
n:=n+1}
There are some subtleties here...
131© Krithi Ramamritham / Kavi Arya
Clock, interrupts, tasks
Clock ProcessorInterrupts
Task 1 Task 2 Task 3 Task 4
Job/Task queue
Examines
Tasks schedule events using the clock...
132© Krithi Ramamritham / Kavi Arya
Plan• Special Characteristics of Real-Time Systems
• Real-Time Constraints
• Canonical Real-Time Applications
• Scheduling in Real-time systems
• Operating System Approaches
133© Krithi Ramamritham / Kavi Arya
Why is scheduling important?
Definition:
A real-time system is a system that reacts to events in the environment by performing predefined actions within specified time intervals.
Real-timecomputing system
time
I/O - data
I/O - data
event
action
134© Krithi Ramamritham / Kavi Arya
Schedulability analysis
a.k.a. feasibility checking:
check whether tasks will meet their
timing constraints.
135© Krithi Ramamritham / Kavi Arya
Scheduling Paradigms
Four scheduling paradigms emerge, depending on• whether a system performs schedulability
analysis• if it does,
– whether it is done statically or dynamically – whether the result of the analysis itself produces
a schedule or plan according to which tasks are dispatched at run-time.
136© Krithi Ramamritham / Kavi Arya
1. Static Table-Driven Approaches
• Perform static schedulability analysis by checking if a schedule is derivable.
• The resulting schedule (table) identifies the start times of each task.
• Applicable to tasks that are periodic (or have been transformed into periodic tasks by well known techniques).
• This is highly predictable but, highly inflexible.
• Any change to the tasks and their characteristics may require a complete overhaul of the table.
137© Krithi Ramamritham / Kavi Arya
2. Static Priority Driven Preemptive Approaches
• Tasks have -- systematically assigned -- static priorities.• Priorities take timing constraints into account:
– e.g. RMA: Rate-Monotonic ---- the lower the period, the higher the priority.– e.g. EDF: Earliest-deadline-first --- the earlier the deadline, the higher the priority.
• Perform static schedulability analysis but no explicit schedule is constructed– RMA - Sum of task Utilizations <= ln 2. – EDF - Sum of task Utilizations <= 1
• At run-time, tasks are executed highest-priority-first, with preemptive-resume policy.• When resources are used, need to compute worst-case blocking times.
Task utilization =
computation-time / Period
138© Krithi Ramamritham / Kavi Arya
Static Priorities:Rate Monotonic Analysis
presented by Liu and Layland in 1973
Assumptions• Tasks are periodic with deadline equal to period.
Release time of tasks is the period start time.• Tasks do not suspend themselves• Tasks have bounded execution time• Tasks are independent• Scheduling overhead negligible
139© Krithi Ramamritham / Kavi Arya
RMA: Design Time vs. Run Time
At Design Time:Tasks priorities are assigned according to their periods; shorter period means
higher priority
Schedulability testTaskset is schedulable if
Very simple test, easy to implement.
Run-time The ready task with the highest priority is executed.
C i
T ii1
n
n(21/ n 1)
140© Krithi Ramamritham / Kavi Arya
RMA: Exampletaskset: t1, t2, t3, t4 t1 = (3, 1) t2 = (6, 1) t3 = (5, 1) t4 = (10, 2)
The schedulability test:1/3 + 1/6 + 1/5 + 2/10 ≤ 4 (2(1/4) - 1) ?
0.9 < 0.75 ?
…. not schedulable
141© Krithi Ramamritham / Kavi Arya
RMA…A schedulability test is • Sufficient: there may exist tasksets that fail the test, but are schedulable• Necessary: tasksets that fail are (definitely) not schedulable
The RMA schedulability test is sufficient, but not necessary.
e.g., when periods are harmonic, i.e., multiples of each other, utilization can be 1.
142© Krithi Ramamritham / Kavi Arya
Exact RMAby Joseph and Pandya, based on critical instance analysis
(longest response time of task, when it is released at same time as all higher priority tasks)
What is happening at the critical instance?
• Let T1 be the highest priority task. Its response time
R1 = C1 since it cannot be preempted
• What about T2 ?R2 = C2 + delays due to interruptions by T1.
Since T1 has higher priority, it has shorter period. That means it will interrupt T2 at least once, probably more often. Assume T1 has half the period of T2, R2 = C2 + 2 x C1
143© Krithi Ramamritham / Kavi Arya
Exact RMA….In general:
Rni denotes the nth iteration of the response time of task i
hp(i) is the set of tasks with higher priority as task i
R CR
TCi
ni
in
jj hp i
j
1
( )
144© Krithi Ramamritham / Kavi Arya
Example - Exact AnalysisLet us look at our example, that failed the pure rate monotonic test, although we can
schedule it Exact analysis says so.
• R1 = 1; easy• R3, second highest priority task
hp(t3) = T1
R3 = 2
R C C
R C C
R R
t
t
t t
t t
t t
31
1 1 2
32
1 1 2
33
32
3 1
3 1
1
3
2
3
145© Krithi Ramamritham / Kavi Arya
• R2, third highest priority taskhp(t2) = {T1 ,T3 }
R2 = 3
R C C C
R C C C
R R
t
t
t t
t t t
t t t
21
1 1 1 3
22
1 1 1 3
23
22
2 1 3
2 1 3
1
3
1
5
3
3
3
5
146© Krithi Ramamritham / Kavi Arya
• R4, third lowest priority taskhp(t4) = {T1 ,T3 ,T2 }
R4 = 9 Response times of first instances of all tasks < their periods => taskset feasible under RM scheduling
R C C C C
R C C C C
R C C C C
t
t
t
t t t t
t t t t
t t t t
41
2 1 1 1 5
42
2 2 1 1 6
43
4 1 2 3
4 1 2 3
4 1 2 3
2
3
2
6
2
5
5
3
5
6
5
5
6
3
6
6
6
5
2 2 1 2 7
44
2 3 2 2 9
45
2 3 2 2 9
45
44
4 1 2 3
4 1 2 3
7
3
7
6
7
5
9
3
9
6
9
5
R C C C C
R C C C C
R R
t
t
t t
t t t t
t t t t
147© Krithi Ramamritham / Kavi Arya
3. Dynamic Planning based Approaches
• Feasibility is checked at run-time -- a dynamically arriving task is accepted only if it is feasible to meet its deadline. – Such a task is said to be guaranteed to meet its time
constraints• One of the results of the feasibility analysis can be a schedule or
plan that determines start times
• Has the flexibility of dynamic approaches with some of the predictability of static approaches
• If feasibility check is done sufficiently ahead of the deadline, time is available to take alternative actions.
148© Krithi Ramamritham / Kavi Arya
4. Dynamic Best-effort Approaches
• The system tries to do its best to meet deadlines. • But since no guarantees are provided, a task may be
aborted during its execution.• Until the deadline arrives, or until the task finishes,
whichever comes first, one does not know whether a timing constraint will be met.
• Permits any reasonable scheduling approach, EDF, Highest-priority,…
149© Krithi Ramamritham / Kavi Arya
Cyclic scheduling• Ubiquitous in large-scale dynamic real-time systems• Combination of both table-driven scheduling and priority
scheduling. • Tasks are assigned one of a set of harmonic periods. • Within each period, tasks are dispatched according to a table
that just lists the order in which the tasks execute.• Slightly more flexible than the table-driven approach • no start times are specified• In many actual applications, rather than making worse-case
assumptions, confidence in a cyclic schedule is obtained by very elaborate and extensive simulations of typical scenarios.
150© Krithi Ramamritham / Kavi Arya
Plan• Special Characteristics of Real-Time Systems
• Real-Time Constraints
• Canonical Real-Time Applications
• Scheduling in Real-time systems
• Operating System Approaches
151© Krithi Ramamritham / Kavi Arya
Real-Time Operating SystemsSupport process management and synchronization, memory
management, interprocess communication, and I/O. Three categories of real-time operating systems:
small, proprietary kernels. e.g. VRTX32, pSOS, VxWorksreal-time extensions to commercial timesharing operatin systems.
e.g. RT-Linux, RT-NTresearch kernels
e.g. MARS, ARTS, Spring, Polis
152© Krithi Ramamritham / Kavi Arya
Real-Time Applications Spectrum
Hard
Soft
Real-Time Operating System
General-PurposeOperatingSystem
VxWorks, Lynx, QNX, ...
Windows NT
Windows CE
Intime, HyperKernel, RTX
153© Krithi Ramamritham / Kavi Arya
Real-Time Applications Spectrum
Hard
Soft
Real-Time Operating System
General-PurposeOperatingSystem
VxWorks, Lynx, QNX, ...Intime, HyperKernel, RTX
Windows NT
Windows CE
154© Krithi Ramamritham / Kavi Arya
Embedded (Commercial) KernelsStripped down and optimized versions of timesharing operating systems. • Intended to be fast
– a fast context switch,– external interrupts recognized quickly– the ability to lock code and data in memory– special sequential files that can accumulate data at a fast rate
• To deal with timing requirements– a real-time clock with special alarms and timeouts– bounded execution time for most primitives– real-time queuing disciplines such as earliest deadline first,– primitives to delay/suspend/resume execution– priority-driven best-effort scheduling mechanism or a table-driven mechanism.
• Communication and synchronization via mailboxes, events, signals, and semaphores.
155© Krithi Ramamritham / Kavi Arya
Real-Time Extensions to General Purpose Operating
Systems
E.g., extending LINUX to RT-LINUX, NT to RT-NT• Advantage:
– based on a set of familiar interfaces (standards) that speed development and facilitate portability.
• Disadvantages– Too many basic and inappropriate underlying assumptions
still exist.
156© Krithi Ramamritham / Kavi Arya
Using General Purpose Operating
Systems
• GPOS offer some capabilities useful for real-time system builders
• RT applications can obtain leverage from existing development tools and applications
• Some GPOSs accepted as de-facto standards for industrial applications
157© Krithi Ramamritham / Kavi Arya
Real Time Linux approaches
1. Modify the current Linux kernel to handle RT constraints– Used by KURT
2. Make the standard Linux kernel run as a task of the real-time kernel– Used by RT-Linux, RTAI
158© Krithi Ramamritham / Kavi Arya
Modifying Linux kernel
• Advantages– Most problems, such as interrupt handling, already
solved– Less initial labor
• Disadvantages– No guaranteed performance– RT tasks don’t always have precedence over non-RT
tasks.
159© Krithi Ramamritham / Kavi Arya
Running Linux as a process of a second RT kernel•Advantages
–Can make hard real time guarantees
–Easy to implement a new scheduler
•Disadvantages
–Initial port difficult, must know a lot about underlying hardware
–Running a small real-time executive is not a substitute for a full-fledged RTOS
161© Krithi Ramamritham / Kavi Arya
GPOS -- for RT applications?
• Scheduling and priorities– Preemptive, priority-based scheduling
non-degradable priorities priority adjustment
– No priority inheritance– No priority tracking – Limited number of priorities– No explicit support for guaranteeing timing constraints
162© Krithi Ramamritham / Kavi Arya
Thread Priority = Process class + level
Real-timeclass
2625242322
16 Idle
Above NormalNormalBelow NormalLowest
Highest31 Time-critical
Dynamicclasses
15 Time-critical
14131211
15
High class
1 Idle
987
11
Normal class10
5432
6
Idle class
ThreadLevel
163© Krithi Ramamritham / Kavi Arya
Scheduling Priorities
• Threads scheduled by executive.
• Priority based preemptive scheduling.
Interrupts
Deferred Procedure Calls (DPC)
System anduser-level threads
164© Krithi Ramamritham / Kavi Arya
GPOS -- for RT applications? (contd.)
• Quick recognition of external events– Priority inversion due to Deferred Procedure Calls (DPC)
• I/O management• Timers granularity and accuracy
– High resolution counter with resolution of 0.8 sec. – Periodic and one shot timers with resolution of 1 msec.
• Rich set of synchronization objects and communication mechanisms. – Object queues are FIFO
165© Krithi Ramamritham / Kavi Arya
Research Operating Systems
• MARS – static scheduling• ARTS – static priority scheduling• Spring –dynamic guarantees
166© Krithi Ramamritham / Kavi Arya
MARS -- TU, Vienna (Kopetz)Offers support for controlling a distributed application based
entirely on time events (rather than asynchronous events) from the environment.
• A priori static analysis to demonstrate that all the timing requirements are met.
• Uses flow control} on the maximum number of events that the system handles.
• Based on the time driven model -- assume everything is periodic.• Static table-driven scheduling approach• A hardware based clock synchronization algorithm• A TDMA-like protocol to guarantee timely message delivery
167© Krithi Ramamritham / Kavi Arya
ARTS -- CMU (Tokuda, et al)• The ARTS kernel provides a distributed real-time computing
environment.• Works in conjunction with the static priority driven preemptive
scheduling paradigm. • Kernel is tied to various tools that a priori analyze schedulability.• The kernel supports the notion of real-time objects and real-time
threads. • Each real-time object is time encapsulated -- a time fence
mechanism:The time fence provides a run time check that ensures that the slack time is greater than the worst case execution time for an object invocation
168© Krithi Ramamritham / Kavi Arya
SPRING – Umass. (Ramamritham & Stankovic)
• Real-time support for multiprocessors and distributed sys• Strives for a more flexible combination of off-line and on-line
techniques– Safety-critical tasks are dealt with via static table-driven scheduling. – Dynamic planning based scheduling of tasks that arrive dynamically.
• Takes tasks' time and resource constraints into account and avoids the need to a priori compute worst case blocking times
• Reflective kernel retains a significant amount of application semantics at run time – provides flexibility and graceful degradation.
169© Krithi Ramamritham / Kavi Arya
Polis: Synthesizing OSs• Given a FSM description of a RT application• Each FSM becomes a task• Signals, Interrupts, and polling • Tasks with waiting inputs handled in FIFS order (priority
order – TB done)• Some interrupts can be made to directly execute the
corresponding task• Needed OS execute synthesized based on just what is
needed
170© Krithi Ramamritham / Kavi Arya
Reconfigurable Lab -- Hardware Environment
171© Krithi Ramamritham / Kavi Arya
IIT-KReSIT Reconfigurable Computing Lab Projects (2002)
• Network packet-processing- Berkeley Packet Filter- Packet Classifier (a la Stiliades/ Laxman)
• Video codec- MPEG-4 with encryption
• Video-editing System
• Real-time reactive control systems- Inertial Navigation System (ILS)- Flight simulation- Scheduling co-processor
• Satellite Error Correcting codec
172© Krithi Ramamritham / Kavi Arya
References1. Lava material based on personal communication with Mary
Sheeran with illustrations, Nov.2002. Also “A Tutorial on Lava: A Hardware Description Language and Verification System”, Koen Claessen, Mary Sheeran, Aug.2000.
2. Handel-C material based on “Handel-C v3.0 Language Reference Manual”, 2001, Celoxica Ltd.
3. “Embedded System Design: A Unified Hardware/ Software Introduction”, Frank Vahid, Tony Givargis, John Wiley & Sons Inc., 2002.
4. “Customising Graphics Applications: Techniques & Programming Interface” Henry Styles & Wayne Luk, Proceedings of IEEE Symposium on Field Programmable Custom Computing Machines, IEEE Computer Society Press, 2000.
173© Krithi Ramamritham / Kavi Arya
References…
7. K.Ramamritham and J.A. Stankovic, Scheduling Algorithms and Operating Systems Support for Real-Time Systems, Proceedings of the IEEE, Jan 1994, pp. 55-67.
8. K. Ramamritham et al. Using Windows NT for Real-Time Applications: Experimental Observations and Recommendations, IEEE Real-Time Technology and Applications Conference, June 1998.
9. RT-Linux : http://www.rtlinux.org
© Krithi Ramamritham / Kavi Arya
Summary
• What are Embedded Systems?
• What is Embedded software?
• New Approaches to building ESW
• Real-time support for ESW