bkk16-203 irq prediction or how to better estimate idle time
TRANSCRIPT
![Page 1: BKK16-203 Irq prediction or how to better estimate idle time](https://reader031.vdocument.in/reader031/viewer/2022021918/58a8a2a81a28ab7f458b532b/html5/thumbnails/1.jpg)
Presented by
Date
Event
Sched idleNext irq event predictionDaniel Lezcano
BKK16-203 March 8, 2016
Linaro Connect BKK16
![Page 2: BKK16-203 Irq prediction or how to better estimate idle time](https://reader031.vdocument.in/reader031/viewer/2022021918/58a8a2a81a28ab7f458b532b/html5/thumbnails/2.jpg)
Introduction
● Idle mechanism
● Limitations
● New approach
![Page 3: BKK16-203 Irq prediction or how to better estimate idle time](https://reader031.vdocument.in/reader031/viewer/2022021918/58a8a2a81a28ab7f458b532b/html5/thumbnails/3.jpg)
Introduction
● Idle does not work if the break even is not reached○ Consumes more energy○ Worst performance
● Target residency is the break even○ https://wiki.linaro.org/WorkingGroups/PowerManagement/Doc/ComputingTargetResidency
● Idle duration prediction is the key
![Page 4: BKK16-203 Irq prediction or how to better estimate idle time](https://reader031.vdocument.in/reader031/viewer/2022021918/58a8a2a81a28ab7f458b532b/html5/thumbnails/4.jpg)
The idle process
● No task for each sched class
● Default task is the idle task
![Page 5: BKK16-203 Irq prediction or how to better estimate idle time](https://reader031.vdocument.in/reader031/viewer/2022021918/58a8a2a81a28ab7f458b532b/html5/thumbnails/5.jpg)
The idle task
● Two infinite loops○ One to ensure idle routine is called next time○ One to handle wakeup + idle
● Yield to another task if any○ As soon as there is a task to be run, the idle task is
switched● Enters / exit the idle routine
○ Idle time measurements
![Page 6: BKK16-203 Irq prediction or how to better estimate idle time](https://reader031.vdocument.in/reader031/viewer/2022021918/58a8a2a81a28ab7f458b532b/html5/thumbnails/6.jpg)
Current approach
● When a CPU wakes up from an idle state○ idle time measurement
● When a CPU goes idle○ statistics on idle time○ Reuse statistics to guess estimate the idle duration
![Page 7: BKK16-203 Irq prediction or how to better estimate idle time](https://reader031.vdocument.in/reader031/viewer/2022021918/58a8a2a81a28ab7f458b532b/html5/thumbnails/7.jpg)
Current approach
● What woke up the CPU ?○ a timer ? an IO device ? an IPI ?
■ https://wiki.linaro.org/WorkingGroups/PowerManagement/Doc/WakeUpSources
● It is not possible to know. It results to:○ statistics on timers but they are deterministic○ statistics on IPI, thus on the scheduler behavior
![Page 8: BKK16-203 Irq prediction or how to better estimate idle time](https://reader031.vdocument.in/reader031/viewer/2022021918/58a8a2a81a28ab7f458b532b/html5/thumbnails/8.jpg)
Current approach
● This wake up sources soup leads to some complex re-adjustments in the statistics○ empiric○ platform specific (x86)○ monolithic
● It is hard to tell if the prediction was correct
![Page 9: BKK16-203 Irq prediction or how to better estimate idle time](https://reader031.vdocument.in/reader031/viewer/2022021918/58a8a2a81a28ab7f458b532b/html5/thumbnails/9.jpg)
Current approach
● Prediction and governor are tied○ CPUidle is a standalone subsystem○ No interaction with the scheduler
● Scheduler can’t use the prediction value to anticipate anything○ Hard to integrate the power subsystems
![Page 10: BKK16-203 Irq prediction or how to better estimate idle time](https://reader031.vdocument.in/reader031/viewer/2022021918/58a8a2a81a28ab7f458b532b/html5/thumbnails/10.jpg)
What do we want ?
● The scheduler to act:○ as a governor○ pro actively○ with knowledge of future events
![Page 11: BKK16-203 Irq prediction or how to better estimate idle time](https://reader031.vdocument.in/reader031/viewer/2022021918/58a8a2a81a28ab7f458b532b/html5/thumbnails/11.jpg)
What do we need ?
● Split prediction and idle state selection
● Increase the prediction accuracy
● Let the scheduler to have access to the prediction
![Page 12: BKK16-203 Irq prediction or how to better estimate idle time](https://reader031.vdocument.in/reader031/viewer/2022021918/58a8a2a81a28ab7f458b532b/html5/thumbnails/12.jpg)
How do we do that ?
● Track the interrupts independently○ measure their intervals○ apply a simple mathematical formula for their
behavior○ guess estimate the next event per interrupt
● Timers and IPI are out of the equation● Full view of what is happening on the system
![Page 13: BKK16-203 Irq prediction or how to better estimate idle time](https://reader031.vdocument.in/reader031/viewer/2022021918/58a8a2a81a28ab7f458b532b/html5/thumbnails/13.jpg)
Analysis
● A network trafic + console output○ kernelshark trace-net.dat
● A SSD random read○ kernelshark trace-ssd.dat
![Page 14: BKK16-203 Irq prediction or how to better estimate idle time](https://reader031.vdocument.in/reader031/viewer/2022021918/58a8a2a81a28ab7f458b532b/html5/thumbnails/14.jpg)
Analysis - SSD
![Page 15: BKK16-203 Irq prediction or how to better estimate idle time](https://reader031.vdocument.in/reader031/viewer/2022021918/58a8a2a81a28ab7f458b532b/html5/thumbnails/15.jpg)
Analysis - SSD
● Beginning of the measurement => writing a big file
● Rest of the measurement => random read● Write is hard to predict● Read is stable
![Page 16: BKK16-203 Irq prediction or how to better estimate idle time](https://reader031.vdocument.in/reader031/viewer/2022021918/58a8a2a81a28ab7f458b532b/html5/thumbnails/16.jpg)
Analysis - Network
![Page 17: BKK16-203 Irq prediction or how to better estimate idle time](https://reader031.vdocument.in/reader031/viewer/2022021918/58a8a2a81a28ab7f458b532b/html5/thumbnails/17.jpg)
Analysis - Network
● Intervals are stable
● Repeating patterns○ Four values○ Looks like gaussian => normal law ?
![Page 18: BKK16-203 Irq prediction or how to better estimate idle time](https://reader031.vdocument.in/reader031/viewer/2022021918/58a8a2a81a28ab7f458b532b/html5/thumbnails/18.jpg)
Analysis - Serial console
![Page 19: BKK16-203 Irq prediction or how to better estimate idle time](https://reader031.vdocument.in/reader031/viewer/2022021918/58a8a2a81a28ab7f458b532b/html5/thumbnails/19.jpg)
Analysis - serial
● Very stable interrupt
● Repeating pattern easily predictable○ ~1ms x 5 + ~210ms
![Page 20: BKK16-203 Irq prediction or how to better estimate idle time](https://reader031.vdocument.in/reader031/viewer/2022021918/58a8a2a81a28ab7f458b532b/html5/thumbnails/20.jpg)
Analysis
Interrupt A Interrupt B
Sampling
Next event Interrupt A
Next event Interrupt BMin
Next event is coming from interrupt A
![Page 21: BKK16-203 Irq prediction or how to better estimate idle time](https://reader031.vdocument.in/reader031/viewer/2022021918/58a8a2a81a28ab7f458b532b/html5/thumbnails/21.jpg)
Analysis - Conclusion
● With a correct prediction for each interrupt○ increase of the accuracy for the next event
● Mixing all interrupts, we can choose the earliest one
![Page 22: BKK16-203 Irq prediction or how to better estimate idle time](https://reader031.vdocument.in/reader031/viewer/2022021918/58a8a2a81a28ab7f458b532b/html5/thumbnails/22.jpg)
Analysis - Conclusion
● Risk ? ○ Cumulative error on each irq prediction○ Mathematical model vs performance in kernel○ Different behavior for each irq
● Challenge ?○ Solve the above○ Find the right mathematical model
● Approach ?○ Incremental
![Page 23: BKK16-203 Irq prediction or how to better estimate idle time](https://reader031.vdocument.in/reader031/viewer/2022021918/58a8a2a81a28ab7f458b532b/html5/thumbnails/23.jpg)
IRQ based next event status
● Submitted upstream as RFC○ https://lwn.net/Articles/670505/
● Positive feedbacks and article in LWN○ https://lwn.net/Articles/673641/
![Page 24: BKK16-203 Irq prediction or how to better estimate idle time](https://reader031.vdocument.in/reader031/viewer/2022021918/58a8a2a81a28ab7f458b532b/html5/thumbnails/24.jpg)
Scheduler + idle
● Peter Ziljstra :○ “I think the scheduler simply wants to say: we expect
to go idle for X ns, we want a guaranteed wakeup latency of Y ns -- go do your thing.”
○ https://lkml.org/lkml/2013/11/11/353“
![Page 25: BKK16-203 Irq prediction or how to better estimate idle time](https://reader031.vdocument.in/reader031/viewer/2022021918/58a8a2a81a28ab7f458b532b/html5/thumbnails/25.jpg)
Scheduler + idle
● Two new APIs:○ s64 sched_idle_next_wakeup(void);
■ “[ … ] we expect to go idle for X ns [ … ]”○ int sched_idle(s64 duration, unsigned int latency);
■ “[ …] go do your thing [ …]”○ (Latency API is already there)
![Page 26: BKK16-203 Irq prediction or how to better estimate idle time](https://reader031.vdocument.in/reader031/viewer/2022021918/58a8a2a81a28ab7f458b532b/html5/thumbnails/26.jpg)
Scheduler + idle
● From the scheduler, the information ...○ When a CPU will wake up ?○ How long a CPU will be idle ?
● … are now available● The scheduler has the information to take a
decision regarding the energy saving○ the scheduler will be able to act as a governor
![Page 27: BKK16-203 Irq prediction or how to better estimate idle time](https://reader031.vdocument.in/reader031/viewer/2022021918/58a8a2a81a28ab7f458b532b/html5/thumbnails/27.jpg)
Scheduler + idle
● Task rebalancing○ scheduler knows if it is worth to migrate a task
■ eg. is the target CPU about to wake up ?
● Scheduler can force latencies
![Page 28: BKK16-203 Irq prediction or how to better estimate idle time](https://reader031.vdocument.in/reader031/viewer/2022021918/58a8a2a81a28ab7f458b532b/html5/thumbnails/28.jpg)
Scheduler + idle
● Topology and idle information○ Compute target residency at different level:
■ Cpu■ Cluster■ “SoC”
○ Predicted idle time is the time intersection between components■ Eg. idle(cluster0) = idle(cpu0) ∩ idle(cpu1)
![Page 29: BKK16-203 Irq prediction or how to better estimate idle time](https://reader031.vdocument.in/reader031/viewer/2022021918/58a8a2a81a28ab7f458b532b/html5/thumbnails/29.jpg)
Some results
● Simple approach but better results in some cases
● SMP is much better but UP worst
![Page 30: BKK16-203 Irq prediction or how to better estimate idle time](https://reader031.vdocument.in/reader031/viewer/2022021918/58a8a2a81a28ab7f458b532b/html5/thumbnails/30.jpg)
Some results
![Page 31: BKK16-203 Irq prediction or how to better estimate idle time](https://reader031.vdocument.in/reader031/viewer/2022021918/58a8a2a81a28ab7f458b532b/html5/thumbnails/31.jpg)
Next steps
● Instrumentation● Improve the model
○ Bayesian network (TBS)● Upstream the first bits
○ Approach is accepted● Enlarge community (very important)
○ More testing and validation● Scheduler tuning