linux preempt-rt internals
TRANSCRIPT
● Deadline
● Hard Real-time
– Meet deadline Deterministically– Guaranteed worst case
● Soft Real-time
– Best Effort
Real-time System
Real-time System● Assumption
– critical task has highest priority– task workload is fixed
● Deadline miss
– Undetermined latency– Unpredictable event
Critical TaskLatency?
Critical Event DeadlineEvents?
Latency● Time interval from event trigger till handler task react this
event
Critical TaskLatency?
Critical Event DeadlineEvents?
From Event to Received
InterruptTaken
InterruptReceived inUser/Thread
Context
Critical section with interrupts disabled
HWException
“Top Half” / ISR Exit from IRQ Reschedule Context SwitchSignal/Wakeup
Resource Conflicts
Preemption● Re-schedule when high priority task is ready
● Increase responsibility
● Decrease throughput
Toward full preemption
● Most important aspects of Real-time
– Controlling latency by allowing kernel to be preemptible everywhere
Non-Preemptive● CONFIG_PREEMPT_NONE
● Preemption is not allowed in Kernel Mode
● Preemption could happen upon returning to user space
Non-Preemptive Issue● Latency of Non-Preemptive configuration
Task A(user mode)
Interrupt handlerWakes up Task B
Task B(user mode)
System call
Task A(kernel mode)
Task A(kernel mode)
Interrupt
?Return from syscall
UnboundLatency
Preemption Point● CONFIG_PREEMPT_VOLUNTARY● Insert explicit preemption point in Kernel
– might_sleep → __cond_resched
● Kernel can be preempted only at preemption point
Preemptible Kernel● CONFIG_PREEMPT● Implicit preemption in Kernel
● preempt_count
– Member of thread_info– Preemption could happen when preempt_count == 0
Preemptible Kernel● Kernel preemption could happen when
– return from irq handler__irq_svc:
svc_entry
irq_handler
#ifdef CONFIG_PREEMPT
get_thread_info tsk
ldr r8, [tsk, #TI_PREEMPT] @ get preempt count
teq r8, #0 @ if preempt count != 0
bne 1f @ return from exeption
ldr r0, [tsk, #TI_FLAGS] @ get flags
tst r0, #_TIF_NEED_RESCHED @ if NEED_RESCHED is set
blne svc_preempt @ preempt!
...
#endif
Full Preemptive● Goal: Preempt Everywhere except
– Preempt disable
– Interrupt disable
● Reduce non-preemptible cases in kernel– spin_lock
– Interrupt
Spinlock● Task will be busy waiting until it acquires the lock● Spinlock in Preemptible Linux (CONFIG_PREEMPT)
– Uniprocessor● preempt_disable
– SMP● preempt_disable● Lock acquire, busy waiting
– No sleeping● Convert spinlock to rt-mutex
– Sleeping
Priority Inversion
Acquiresa lock
Priority
Time
preempted
Tries to getthe same
lock waits
● High priority task is preempted by medium priority task
● Unbound waiting for high priority task
Priority Inheritance
Acquiresa lock
Priority
Time
preempted
Tries to getthe lock
waits
Executes with the priority of the waiting process
Releasesthe lock
Acquiresthe lock
preempted
Task triggered by IRQ
interruptlatency
Interrupthandler Scheduler
Interrupt
handlerduration
schedulerlatency
schedulerduration
Processcontext
Interruptcontext
Makes thetask runnable
Total LatencyTask Sleep Task wakeup
Task interrupted by IRQ
Task Run
Interrupthandler
Task Resume
Interrupt
Processcontext
Interruptcontext
Interrupthandler ...
...
Interrupt Preempt LatencyUnbound??
Original Linux Kernel
User Space
User Context
Interrupt Handlers
KernelSpace
InterruptContext
SoftIRQs
Hi p
riotaskle
t s
Netw
or kS
tack
Tim
ers
Reg
ula rtaskle
t s
...
SchedulingPoints
Process Thread
Kernel Thread
Priority of interrupt context is always higher than others
Non-determined Issue ● Interrupt context can always preempt others
● Interrupt as an external event
– Interrupt number of a time interval is non-determinated– Nature of interrupt, can not be avoided
● Behavior of interrupt handler is not well defined
– Non-determined interrupt handler
IRQ in PREEMPT_RT● Threaded IRQ
– IRQ handler is actually a kernel thread by default– Hard IRQ handler only wakes up IRQ handler thread
● Behavior of hard IRQ handler is well defined– Original IRQ handler (No Delayed) is reserved by
IRQF_NO_THREAD flag.
● Remove softirq
– ksoftirqd as a normal kernel thread, handles all softirqs
PREEMPT_RT
User Space
NODELAY Interrupt Handlers
KernelSpace
Ne
two
r k Sta
c k
Tim
ers
Tasklet s
SchedulingPoints
Process Thread
Kernel Threads
Experiments on ARM● OMAP-L138 (ARM11; 375/456 MHz)● Cyclictest
– measuring accuracy of sleep and wake operations of highly prioritized realtime threads
● cyclictest -t1 -p 80 -n -i 1000 -l 600000 -h 1000● Ping target with inverval 0.004s
Tool for Observation● Linux Kernel Tracing Tool● Event tracing
– Tracing kernel eventsmount t debugfs debugfs /sys/kernel/debug
– Event: irq_handler_entry, irq_handler_exit, sched:*
(/sys/kernel/debug/tracing/events/)tracecmd record e irq_handler_entry \
e irq_handler_exit \
e sched:*
tracecmd report
Trace Log Format● tracecmd1061 [000] 142.334403: irq_handler_entry:
irq=21 name=critical_irq
CurrentTask
CPU# TimeStamp
EventName
Message
trace-cmd-1061 [000] 142.334403 irq_handler_entry Irq=21name=critical_irq
Trace Log
interruptlatency
Interrupthandler Scheduler
Interrupt
handlerduration
schedulerlatency
schedulerduration
Processcontext
Interruptcontext
Makes thetask runnable
trace-cmd critical_task
irq_handler_entry irq_handler_exit
sched_wakeup*
sched_wakeup**
sched_switch
sched_wakeup*: wakeup critical_tasksched_wakeup**: wakeup ksoftirqd
Trace Logtracecmd1061 [000] 142.334403: irq_handler_entry: irq=29 name=critical_irq
tracecmd1061 [000] 142.334437: sched_wakeup: critical_task:1056 [9] success=1 CPU:000
tracecmd1061 [000] 142.334456: irq_handler_exit: irq=29 ret=handled
tracecmd1061 [000] 142.334480: sched_wakeup: ksoftirqd/0:3 [98] success=1 CPU:000
tracecmd1061 [000] 142.334526: sched_switch: tracecmd:1061 [120] R ==> critical_task:1056 [9]
Practical Issues● Interrupt
– Non-determined external events– Collision: Several Interrupt happens almost at the same
time– Preemption: Interrupt takes place when task is woke up
but not starts yet.
CriticalHandler
Critical TaskHandler1
CriticalIRQ
IRQ1 IRQ2 ...
Handler2 ...
Critical Task Latency = Critical IRQ Latency + Critical Handler Cost + IRQ1 Latency + Handler1 Cost + … + + Schedule Latency
Interrupt Preemption
● Trace Logtracecmd1079 [000] 1074.488916: irq_handler_entry: irq=29 name=critical_irqtracecmd1079 [000] 1074.488970: sched_wakeup: critical_task:1056 [9] success=1 CPU:000tracecmd1079 [000] 1074.488992: irq_handler_exit: irq=29 ret=handledtracecmd1079 [000] 1074.489018: sched_wakeup: ksoftirqd/0:3 [98] success=1 CPU:000tracecmd1079 [000] 1074.489046: irq_handler_entry: irq=59 name=ohci_hcd:usb1tracecmd1079 [000] 1074.489056: irq_handler_exit: irq=59 ret=handledtracecmd1079 [000] 1074.489075: sched_wakeup: irq/59ohci_hcd:979 [49] success=1 CPU:000tracecmd1079 [000] 1074.489113: sched_stat_runtime: comm=tracecmd pid=1079 runtime=694958 [ns] vruntime=30984068262 [ns]tracecmd1079 [000] 1074.489139: sched_switch: tracecmd:1079 [120] R ==> critical_task:1056 [9]critical_task1056 [000] 1074.489233: irq_handler_entry: irq=21 name=clockeventcritical_task1056 [000] 1074.489361: irq_handler_exit: irq=21 ret=handledcritical_task1056 [000] 1074.489424: sched_switch: critical_task:1056 [9] D ==> irq/59ohci_hcd:979 [49]
Interrupt Preemption
● Defer Other Interrupt
CriticalHandler
Critical TaskHandler1
CriticalIRQ
IRQ1 IRQ2 ...
Handler2 ...
CriticalHandler
Critical Task Handler1
CriticalIRQ
IRQ1 IRQ2 ...
Handler2 ...
Interrupt Preemption
● Disable other interrupts when critical interrupt occurs● Schedule a tasklet to re-enable other interrupts.
– Critical Task's priority could be higher than tasklet.– Ensure determination of critical task.
CriticalHandler
Critical Task Handler1
IRQ1 IRQ2 ...
Handler2 ...
CriticalIRQ
tasklet
Interrupt Preemption
CriticalHandler Critical TaskHandler1
IRQ1 CriticalIRQ
● Several Interrupt happens almost at the same time● Set Critical IRQ with highest priority
● Depended on hardware
Interrupt Collision
Handler2
IRQ2
Almost at the same time !!!
● High Resolution Timer
– Non threaded Interrupt● Non-determined IRQ handler
– Heavy and variable jitter● 100us ~ 160us (375MHZ ARM926EJS)
– Important and complex component of Linux Kernel● Tick timer● Timer for RR scheduler● nanosleep, clock_nanosleep ● Futex, rtmux● Others
– Can not be disabled
Many Unpredicted Timer IRQ sources !!
HRT Impact
Simplify Timer Interrupt● Move to Periodic Timer
– Tick is the only one timer source– However, the behavior of timer interrupt is still non-
determined– Timer is not precise enough
● Periodic task could not be Real-time task
Conclusion● PREEMPT_RT patch significantly improves the
preemptiveness of Linux kernel.– However we should keep in mind that these results are
statistical in nature rather than being conclusive.● Using PREEMPT_RT will not guarantee that your system is
hard real-time.– Your system and applications will also have to be designed
properly including correct priorities, use of deterministic APIs, allocation of critical resources.