1 presented by: jeff schaffer sr. field applications engineer qnx software systems...
TRANSCRIPT
1
Presented by:
Jeff SchafferSr. Field Applications EngineerQNX Software [email protected]
“Embedded Operating Systems:
The State of the Art”
QNX is a leading provider of real time operating system (RTOS) software, development tools, and services for mission critical embedded applications.
2
Role of the Embedded OS
Traditional
– Permit sharing of common resources of the computer (disks, printers, CPU)
– Provide low-level control of I/O devices that may be complex, time dependent, and non-portable
– Provide device-independent abstractions (e.g. files, filenames, directories)
Additional Roles
– Prevent common causes of system failure and instability; minimize impact when they occur
– Extend system life cycles
– Isolate problems during development and at runtime
3
Architecture Comparison
REAL TIME EXECUTIVEAdvantage: single address spaceDisadvantage: single address space,
different binary imagesFailure: means reboot
MONOLITHIC KERNELAdvantage: apps run in own memory spaceDisadvantage: kernel not protected,
kernel testingFailure: might mean reboot
TRUE MICROKERNELAdvantageModules run in own memory spaceAdd/replace services on the flyReusable modulesDirect hardware accessDisadvantage: context switchingFailure: usually does not mean reboot
4
MicrokernelX86, PPC, MIPS, SH4,
ARM, StrongARM, XScale
App
PhotonGUI
Flashfsys Audio
driver
TCP/IP
Serialdriver Http
serverJava
ProcessManager
• Dynamic architecture makes hot-start and upgrades easy, even with drivers
• Philosophy: a trusted kernel running a system of untrusted software components
• Processes provide a reusable component model with well defined message interfaces
• Processes communicate via messages or other methods, such as shared memory. Permits loose inter-module coupling.
• No requirement for filesystem, GUI, etc.
MicroKernel – Neutrino
5
Process 1 Process 2
Pipes
Process address
mapShared memoryobject
map
Process address
map
mapSharedMemory
msg 5msg 2msg 3msg 4Process 1 Process 2MessageQueues
Typical Forms of IPC
Mailboxes
Kernel
6
Which Architecture for me?
Depends on your application and processor! Simple apps (such as single control loops) generally
only need a real-time executive As system becomes more complex, typically need a
more complex operating system architecture Need to look at factors such as scalability and
reliability Do standards matter?
API’sTwo most common standards
Advantages of standardsPortability of code
Hiring of programmers
8
Less than 1 second response?
Less than 1 millisecond response?
Less than 1 microsecond response?
Do I need Real-Time?
What is Real Time?
Maybe ...
9
Real-Time
"A real-time system is one in which the correctness of the computations not only
depends upon the logical correctness of the computation but also upon the time at which
the result is produced. If the timing constraints of the system are not met, system
failure is said to have occurred."
Donald Gillies (comp.realtime FAQ)
10
A Simple Example...
“it doesn’t do you any good if the signal that cuts fuel to the jet engine arrives a millisecond after the engine
has exploded”
Bill O. Gallmeister - POSIX.4 Programming for the Real World
11
ATM
“Hard” vs. “Soft” Real Time
Hard– absolute deadlines– late responses cannot be tolerated and may have a
catastrophic effect on the system– example: flight control
Soft– systems which have reduced constraints on "lateness”;
e.g. late responses may still have some value– still must operate very quickly and repeatably– example: cardiac pacemaker
12
Real-time OS Requirements
Operating system factors that permit real-time:– Thread Scheduling– Control of Priority Inversion– Time Spent in Kernel– Interrupt Processing
13
Factor #1: Scheduling
Non real-time scheduling– round-robin– FIFO– adaptive
Real-time scheduling– priority based– sporadic
14
Sequence:1. Low priority task acquires bus mutex to transfer data2. High priority task blocks until mutex released3. Medium priority task pre-empts low priority task4. Watchdog timer resets since Bus Manager has not run in some time
Factor #2: Priority Inversion
Source: Embedded Systems Programming
Information Bus Manager
Meteorological Data Gathering Task
Communications Task
15
Factor #3: Kernel Time
Kernel operations must be pre-emptible– if they are not, an unknown amount of time can
be spent in the kernel performing an operation on behalf of a user process
– can cause real-time process to miss deadline All kernels have some window (or multiple windows)
of time where pre-emption cannot occur Some operating systems attempt to provide real-
time capability by adding “checkpoints” within the kernel so they can be interrupted at these points
16
int KER
iret
Entry a few opcodes Interrupts off
Unlocked
KernelOperation
whichmay
includemessage
pass
usecstomsecs
Pre-emptable
Exit a few opcodes Interrupts off
Locked usecsNo pre-emptionInterrupts on
Unlocked usecs Pre-emptable
A Kernel call is asoftware interrupt
Example
Split Out Long OperationsSplit Out Long Operations
ProcessManager
Thread
Sync
Message
Sched
Signal
Channel
ClockTimer
Intr
Fork
Exec
Pathname
Spawn
Mmap
Waitpid
SessionUID/GID
Debug
Nto Proc
18
Factor #4: Interrupts
This is broken down into the following areas: Method of handling the interrupt processing chain Handling of Nested Interrupts
19
Interrupt Processing Chain
ISR
INT x
ISR
INT y
IST IST
IST scheduled whenever queue emptied, non-deterministic
ISR
INT x
ISR
INT y
IST IST
IST scheduled by normal OS scheduling,
deterministic
20
Conventional OS
Real-time kernel
Problems– different API’s– real-time layer proprietary– existing OS apps not R/T– poor communication
between operating systems– loss of control issue
Can I Make Any Conventional OS Real-Time
Method– Add real-time layer below
conventional OS, running conventional OS as a low priority real-time process
– Add real-time layer to hardware service layer
21
Title of presentationTitle 2
Scalability
22
Scaling Solution #1:Single Board, Single Node
CPU
Bridge Mem.
Bus PCI
Peripherals
The only scaling possible is a CPU replacement
23
Scaling Solution #2:Single Board, Multiple Nodes
Relatively simple to implementAllows “scaling-on-demand”Suitable if nodes have independent
“work”
Inter-node IPC slower than memory accessComplexity in maintaining global view of dataDifficult to break-up computationally-intensive
tasks
CPU
Bridge Mem.
Bus PCI
Peripherals
CPU
Bridge
Bus PCI
Peripherals
Node 1
Node 2
24
Scaling Solution #3:Single Board, Multiple Processors
CPU0
Bridge Mem.
Bus
PCI
PeripheralsCPU1
Tightly-coupled symmetric multiprocessing (SMP) All processors have a symmetric and consistent view
of physical memory and peripherals Scales processing power Need software (RTOS) support
25
The SMP OS Dilemma
SMP systems to date use desktop operating systems; not responsive enough for real-time requirements
• Application servers• Databases• Web servers
Typical real-time operating systems (home-built or commercial), such as are commonly used in routers and switches today, do not have SMP support
SMP capable real-time operating systems run the CPU’s as independent processors with independent operating systems
26
SMP Support
True (tightly coupled) SMP support
Only the kernel needs SMP awareness
Transparent to application software and drivers - identical binaries for UP and SMP systems
Automatic scheduling across all CPU’s
27
Thread
Running
CPU 0Process
CPU 1
Thread
Process
Ready queues
63Priority
6261...0
Thread Thread
Thread
Blocked states Thread Thread
QNX “True” SMP
STATE_RUNNING thread on each processor
Priority-based ready queues
Each thread can be locked to a specific CPU by using a processor affinity mask
Scheduler remembers last CPU thread ran on
– Minimize thread migration– Optimize cache usage
Highest-priority READY thread always immediately scheduled
28
Why Is Cache Important?
Cache efficiency is probably the single largest determinant of performance on SMP
Coherent view of physical memory is maintained using cache snooping
Cache snooping is done at the CPU bus level and so operates at lower speeds than core
Coherency is “invisible” to software
29
Performance Implications
Snoop traffic expected on SMP Cache hits generally cause no bus transaction Multiple processors writing to same location
degrades performance (ping-pong effect) Performance degrades when large amount of data
modified on one processor and read on the other Sometimes it is better to have specific threads in a
process run on same CPU
30
Designing for SMP:One Big task
Single thread
Giant App
• Will not work with SMP
31
Designing for SMP:Single Threaded Tasks
App 1
Single thread
App 2
Single thread
• Works with SMP• Process data can be shared with shared memory
• Good concurrency, some complexity
• IPC not usually as efficient as memory sharing
32
Designing for SMP:Scaling Software with Threads
Threads
Server
• Single copy server• All process data is implicitly shared and accessible
• Can achieve good concurrency with less complexity
• POSIX synchronization used• Mutexes• Semaphores• Condition variables• Usually more efficient than
inter-process synchronization
Note: SMP finds concurrency problems fast!
33
Optimizing Compute-intensive Applications
Main thread
Threads
Application
Worker thread
Worker thread
Pool of worker threads Dispatch “work” to worker
threads Scales very well with SMP The tricky part is “breaking
up” the problem
34
CPU 0CPU 0 CPU 1CPU 1
IRQ 7IRQ 7
IRQ 8IRQ 8 IRQ 9IRQ 9
IRQ 10IRQ 10
IRQ CPU7 08 19 110 1
ISRISR
IST
Interrupt processed on CPU that was targeted
Can distribute load by handling interrupts on different processors
Sometimes not the optimal strategy due to cache effects
Interrupt Handling
35
Scaling Solution #4:Multiple Processors/Nodes
CPU0
Bridge Mem.
Bus
PCI
PeripheralsCPU1
CPU0
Bridge
Bus
PCI
PeripheralsCPU1
Node 2
Node 1
36
Network
Network
Chassis
Network
Network
Network
Network
...
Hig
h-s
pe
ed
inte
rco
nn
ect
Lo
w-s
pee
d b
us
Line card
Line card
Example
QNET
Messages flow transparently through QNET from one message bus to another.
LAN orInternet orBackplane
QNET
MicrokernelApp
All applications and servers become network distributed without any special code.
FlashFsys CDROM
Fsys
TCP/IP
AudioPhotonApp
ProcessManager
The QNET MicroNetwork
38
LineLinecardcard
LineLinecardcard
ControlControlcardcard
QNX Qnet Manager
Extends message passing across multiple QNX microkernels
Over anything with a packet driver:
– Ethernet, RapidIO, 3GIO, InfiniBand, Stargen, etc.
Class of service Use symbolic prefixes to make
client code independent of location of resource manager
39
Linecard
Controlcard
Linecard
One or multiple links can connect different nodes.
QNET Class of Service
40
Data is sent out the link which will deliver it the fastest. This is based upon link speed and queue length for each link.
Linecard
Controlcard
Linecard
QNET: Load-Balanced Distribution
41
Data is sent out a primary link. If it fails, data is diverted to a secondary link. The primary link is probed and when it comes back online, data is diverted back to it.
Linecard
Controlcard
Linecard
QNET: Ordered Distribution
42
Data is sent out both links at the same time. A failure on either of the links is handled gracefully.
Linecard
Controlcard
Linecard
QNET: Parallel Distribution
43
Designing for Networked SMP:Single/Multi Threaded Tasks
App 1
Multiple threads
App 2
Single thread
• Different processes necessary for different nodes
• Works with SMP• Process data can be shared with shared memory
• IPC for networked communication
44
Client /service
Client Node
A
B
/net/a/dev/service
/net/b/dev/service
• Simple link provides transparent redirection• Process has to monitor status of link• Switch over is not transparent to client
Transparent Redirection
45
Client
Client Node
A
B
/net/a/dev/service
/net/b/dev/service
Servicemgr
• Service manager acts as a proxy• Monitors health of and/or load on services/nodes• Switch over is transparent to client
/dev/service
Transparent Redirection
46
Client
Client Node
A
B
/net/a/dev/service
/net/b/dev/service
Servicemgr
/dev/service
• Requests serviced redundantly • First/majority/best result• Different implementations
Redundant Links
FLASHFSYS TCP/IP
App App
BlueTooth
Qnet
MO
ST
BU
S
FLASHFSYS Graphics
Browser Audio
Photon
Qnet
CDROMFSYS
Graphics
Browser Audio
Photon
Qnet
FLASHFSYS TCP/IP
App App
BlueTooth
FLASHFSYS Graphics
Browser Audio
Photon
Qnet
CDROMFSYS
Graphics
Qnet
Qnet
MO
ST
BU
S
Browser
49
Title of presentationTitle 2
Reliability and Availability
50
Why?
Embedded systems are different! Failure in an embedded system can have severe
effects - like death …
“Pilots really hate to be told they have
to reboot their plane while in flight”Walter Shawlee
51
Definitions
MTBF: Mean Time Between Failure– The average number of hours between failures for a
large number of components over a long time. (e.g. MIL-HDBK-217)
MTTR: Mean Time To Repair– Total amount of time spent performing all corrective
maintenance repairs divided by the number of repairs
MTBI: Mean Time Between Interruptions.– The average number of hours between failures while
a redundant component is down.
52
Defining HA
Quantified by failure rate (MTBF) Time to resume service after failure is MTTRReliability
Allows for failure, with quick service restoration. As MTTR 0, Availability 100%Availability
< 5 minutes downtime / year (> 99.999% uptime)Assume faults exist: design to contain, notify, recover and restore rapidly
5 Nines
53
$68,372,928
$6,837,293$683,729 $68,373
99% 99.9% 99.99% 99.999%
an
nu
al l
oss
es
annual availability
Source: Gartner Group ($13,000/minute Cross-industry Average)
Annual Cost of Downtimeversus Availability
Costs speak for themselves
54
Availability via Reliability and Repair
low MTTR -> high availability– System is composed of reliable components, that
are protected from each other, and that communicate ONLY through well known interfaces.
this leads to– fault isolation– speedy recovery– reset a component not a board/system– dynamic control
• stop/start• upgrade
55
Software vs Hardware HA
Hardware HA– utilizes redundancy of key components
• a single fault cannot cause all redundant components to fail (No SPOF). e.g. mirrored disks, multiple system boards, I/O cards
– Active/active, active/spare, active/standby
Software is a Significant Cause of Downtime
But that’s only part of the problem!!!
56
Comparison
Software Fault40%
Planned Outage
30%
Operator Error15%
Environment5%
Hardware10%
57
High Level Look at a Core Router/Switch
One or more control elements
OC
LD
(1W
)
OC
LD
(2W
)
OC
LD
(3W
)
OC
LD
(4W
)
OC
I (1
A)
OC
I (1
B)
OC
I (2
A)
OC
I (2
B)
OC
M (
A)
OC
M (
B)
OC
I (3
A)
OC
I (3
B)
OC
I (4
A)
OC
I (4
B)
OC
LD
(4E
)
OC
LD
(3E
)
OC
LD
(2E
)
OC
LD
(1E
)
Sh
elf
Pro
cess
or
Fill
er
I
O
OFF
ON
I
O
OFF
ON
Maintenance Panel
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Fiber Management Trough
Optical Multiplexer Tray (OMX)
Cooling Unit
58
Handling Failures
OC
LD
(1W
)
OC
LD
(2W
)
OC
LD
(3W
)
OC
LD
(4W
)
OC
I (1
A)
OC
I (1
B)
OC
I (2
A)
OC
I (2
B)
OC
M (
A)
OC
M (
B)
OC
I (3
A)
OC
I (3
B)
OC
I (4
A)
OC
I (4
B)
OC
LD
(4E
)
OC
LD
(3E
)
OC
LD
(2E
)
OC
LD
(1E
)
Sh
elf
Pro
cess
or
Fill
er
I
O
OFF
ON
I
O
OFF
ON
Maintenance Panel
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Fiber Management Trough
Optical Multiplexer Tray (OMX)
Cooling Unit
Isolate Fault to a Board
Switch to Backup
59
OC
LD
(1W
)
OC
LD
(2W
)
OC
LD
(3W
)
OC
LD
(4W
)
OC
I (1
A)
OC
I (1
B)
OC
I (2
A)
OC
I (2
B)
OC
M (
A)
OC
M (
B)
OC
I (3
A)
OC
I (3
B)
OC
I (4
A)
OC
I (4
B)
OC
LD
(4E
)
OC
LD
(3E
)
OC
LD
(2E
)
OC
LD
(1E
)
Sh
elf
Pro
cess
or
Fill
er
I
O
OFF
ON
I
O
OFF
ON
Maintenance Panel
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Fiber Management Trough
Optical Multiplexer Tray (OMX)
Cooling Unit
Route Manager
TCP/IP stack
SNMP Manager
Application
Application
Flash Drivers
Device Manager
NetworkManager
RTOS
Application
Hardware
Application
Isolate fault to a SW component
May not be in the Hardware
60
Route Manager
TCP/IP stack
SNMP Manager
Application
Application
Flash Drivers
Device Manager
NetworkManager
RTOS
Application
Application
Faulty Software Component
• Isolate and contain• Repair (e.g. restart)• Notify• Diagnose• Upgrade
Ideal: Identify and Fix
61
Component-level recovery rarely done
Lack of suitable protection and isolation Lack of modularity Tight component coupling Few dynamic capabilities
Software failures normally handled by: Hardware watchdogs Redundant boards
62
Repair Time
Board Replacement Hours
Reboot Minutes
Failover to Standby Seconds
SW Component Restart 10’s Milliseconds
SW Failover Milliseconds
63
TCP/IP
HA Managerrestartsservice
FLASHFSYS
DISKFSYS
Microkernel
TCP/IP
HAManagerATM
Process Memory Violation
Kernel notifies HA Manager
Dump file forpost-mortem
analysis
High Availability Manager
64
Driver
HAM HAMGuardian
HAM CheckpointedState
Stack
App
CheckpointedState
HA Manager (HAM) monitors components, sends notification of component failure
Heart-beat services detect component hangs
Core file on crash can be created for debugging and analysis
Checkpointing permits recovering current state
Notification and Recovery
65
• A second “shadow” server attaches to the same name
Recovery
66
• A second “shadow” server attaches to the same name• If primary faults, new clients connect to shadow server• Old clients can re-connect to shadow server.
Recovery
67
• Start a new “shadow” server
Recovery
68
Serverv 1.0Client
/dev/service
/dev/service
Serverv 1.1
NewClient
Service Upgrades
New version of server attaches to same name
New clients connect to new server
Old server exits when all old clients have exited
69
QNX Momentics Tools
70
Design Goals
Tools needed to be easy to learn
Tools which could take advantage of QNX
Tools which could integrate tools from other vendors, company designed tools, and industry specific tools and have them work with our tools and each other
Tools needed to be customizable to the user or the company
71
Windows, Solaris, QNX NeutrinoWindows, Solaris, QNX Neutrino
IDE Workbench(Eclipse framework)
IDE Workbench(Eclipse framework)
Sourcedebugger
Java codedeveloper
Targetinformation
System builder
Profiler
Photon app builder
Memoryanalysis
C/C++ codedeveloper
Targetagent
Targetagent
PhotonmicroGUIPhoton
microGUI
Flashfsys
Flashfsys TCP/IPTCP/IP
HttpserverHttp
serverJavaJava
Ethernet, Serial,JTAG, ROMulator
Microkernel
Command-line
tools
BSPs
DDKs
Neutrinoruntime
3rd-PartyTools
Virtio
Invoke command-line tools
QNX® Neutrino® RTOS
Rational
…TBA
XScale
QNX® Momentics
The Best Tools and the Best RTOS
72
IBM donated FrameworkJava IDE200 person-years of effortOpen Source
Consortium founding members include
QNX IDE: Standards based
73
System Profiling
74
Protocol
TCP/IPDeviceDriver
Application
InstrumentedMicroKernel
Trace
SystemEvent Log
System Events• interrupts,• scheduler, • messages, • system calls
System Characterization• Performance analysis• Field diagnostic• Live or post-mortem
Printer
Data display
Statistical &
Numerical
Analysis
Systems Analysis Toolkit
Providing Technology for Today…Providing Technology for Today…
Architecture for TomorrowArchitecture for Tomorrow
Irvine Office - 949-727-0444David Weintraub - Regional Sales Manager
Woodland Hills Office - 818-227-5105Jeff Schaffer - Sr. Field Applications Engineer