pm infrastructure in the linux* kernel · what about the linux kernel? rafael j. wysocki (intel...
TRANSCRIPT
PM Infrastructure in the Linux* KernelCurrent Status And Future
Rafael J. Wysocki
Intel Open Source Technology Center
October 4, 2016
Rafael J. Wysocki (Intel OTC) PM in the Linux Kernel October 4, 2016 1 / 32
Outline
1 IntroductionThe GoalPower Management Frameworks
2 System-Wide Power Management (Sleep States)How It WorksThe Future
3 Working-State Power ManagementI/O Device Runtime PMCPU Power Management
4 Resources
Rafael J. Wysocki (Intel OTC) PM in the Linux Kernel October 4, 2016 2 / 32
Introduction The Goal
What We Are After
Objective
Use only as much energy as needed to achieve sufficient performance.
What software can do
1 Determine how much throughput/capacity is needed.
2 Determine how much latency is acceptable.
3 Enumerate the hardware PM features.
4 Use them to deliver exactly as much as necessary.
What about the Linux kernel?
Rafael J. Wysocki (Intel OTC) PM in the Linux Kernel October 4, 2016 3 / 32
Introduction The Goal
The Kernel’s Role
Provide means of control
State selection (component level).
Carrying out transitions between states (the “mechanics”).
Response to events (e.g. wakeup).
Estimate the needs
Use information available internally (e.g. from the CPU scheduler).
React to the actions of user space.
Follow trends/patterns.
Rafael J. Wysocki (Intel OTC) PM in the Linux Kernel October 4, 2016 4 / 32
Introduction Power Management Frameworks
Working-State PM and System-Wide PM (Sleep States)
Sle
epin
g
Hibernate
Suspend (Deep)
Suspend-to-Idle (Shallow)
Runtime Idle
Runtime ActiveWork
ing
Rafael J. Wysocki (Intel OTC) PM in the Linux Kernel October 4, 2016 5 / 32
Introduction Power Management Frameworks
Power Management Frameworks in The Linux Kernel
Working
cpufreqPM QoS cpuidle
Sleeping
Hibernation
devfreq
SystemSuspend
Runtime PM
DeviceSuspend
Rafael J. Wysocki (Intel OTC) PM in the Linux Kernel October 4, 2016 6 / 32
System-Wide Power Management (Sleep States) How It Works
System-Wide PM Overview
Global energy-saving states, “frozen” user space
Suspend-to-Idle, Standby, Suspend-to-RAM, Hibernate.
Controlled by user space
1 User space selects the target state.2 User space decides when to start transitions.
Direct command (sysfs write).Autosleep interface.
Used on many systems
Desktop distributions, Android (autosleep interface).
Rafael J. Wysocki (Intel OTC) PM in the Linux Kernel October 4, 2016 7 / 32
System-Wide Power Management (Sleep States) How It Works
System-Wide PM Frameworks
Working
cpufreqPM QoS cpuidle
Sleeping
Hibernation
devfreq
SystemSuspend
Runtime PM
DeviceSuspend
Rafael J. Wysocki (Intel OTC) PM in the Linux Kernel October 4, 2016 8 / 32
System-Wide Power Management (Sleep States) How It Works
System Suspend/Resume Control Flow
ResumeSuspend
Call Notifiers
Freeze Tasks
Device Suspend Device Resume
Thaw Tasks
Call Notifiers
Wait For a Wakeup Interrupt
ResumeSuspend
Call Notifiers
Freeze Tasks
Nonboot CPU Offline
System Core Offline
Device Suspend
Platform Offline
System Core Online
Nonboot CPU Online
Device Resume
Thaw Tasks
Call Notifiers
Platform Online
.prepare()
.suspend()
.suspend_late()
.suspend_noirq()
.prepare()
.suspend()
.suspend_late()
.suspend_noirq().resume_noirq()
.resume_early()
.resume()
.complete()
.resume_noirq()
.resume_early()
.resume()
.complete()
Wait For a Wakeup Event
Full Suspend Suspend to Idle
Rafael J. Wysocki (Intel OTC) PM in the Linux Kernel October 4, 2016 9 / 32
System-Wide Power Management (Sleep States) How It Works
System Hibernation/Restore Control Flow
Call Notifiers
Freeze Tasks
Device Freeze
Create Image
Device Thaw
Freeze Transition
Thaw Transition
Save Image Turn Off Power
Device Power Off
Power Off Transition
.prepare()
.freeze()
.freeze_late()
.freeze_noirq()
Nonboot CPU Offline
System Core Offline
.thaw_noirq()
.thaw_early()
.thaw()
.complete()
Nonboot CPU Online
System Core Online
.prepare()
.poweroff()
.poweroff_late()
.poweroff_noirq()
Nonboot CPU Offline
System Core Offline
Call Notifiers
Freeze Tasks
Device Freeze
Device Restore
Freeze Transition
Nonboot CPU Online
System Core Online
Restore Transition
Thaw Tasks
Call Notifiers
JUMP
Boot Kernel
Image Kernel
Hibernation Restore
.prepare()
.freeze()
.freeze_late()
.freeze_noirq()
Nonboot CPU Offline
System Core Offline
.restore_noirq()
.restore_early()
.restore()
.complete()
Load Image
Restore Memory
Rafael J. Wysocki (Intel OTC) PM in the Linux Kernel October 4, 2016 10 / 32
System-Wide Power Management (Sleep States) The Future
The Future of System-Wide PM
Hibernation
Doesn’t go away (supported on ARM64 now, works with KASLR).
Encrypted images problematic.
Will persistent memory make it obsolete?
Future platforms may not support “platform offline”
Suspend-to-RAM and Standby may not make sense.
CPU offline/core offline steps unnecessary.
Suspend-to-Idle can leverage the existing infrastructure.
Rafael J. Wysocki (Intel OTC) PM in the Linux Kernel October 4, 2016 11 / 32
System-Wide Power Management (Sleep States) The Future
Suspend-to-Idle vs Working-State PM
For long inactivity periods Suspend-to-Idle always uses less energy
Suspends timekeeping (no timer interrupts).
Longer average time between wakeups.
Different mechanisms for different use cases
Suspend on closing laptop lid (working-state PM insufficient).
Opportunistic idle (Suspend-to-Idle insufficient).
Suspend-to-Idle might be more efficient than it is today
Framework/driver issues to fix.
Rafael J. Wysocki (Intel OTC) PM in the Linux Kernel October 4, 2016 12 / 32
Working-State Power Management I/O Device Runtime PM
Suspend/Resume of I/O Devices
Working
cpufreqPM QoS cpuidle
Sleeping
Hibernation
devfreq
SystemSuspend
Runtime PM
DeviceSuspend
Rafael J. Wysocki (Intel OTC) PM in the Linux Kernel October 4, 2016 13 / 32
Working-State Power Management I/O Device Runtime PM
Device PM Operations and The Driver Core
Driver Core
Bus Class
Subsystem Layer
Device Driver
Device
Action
Type
Rafael J. Wysocki (Intel OTC) PM in the Linux Kernel October 4, 2016 14 / 32
Working-State Power Management I/O Device Runtime PM
Device PM Operations and PM Domains
Driver Core
Bus Type
Subsystem Layer
Device Driver
Device
Action
PM Domain
Class
Rafael J. Wysocki (Intel OTC) PM in the Linux Kernel October 4, 2016 15 / 32
Working-State Power Management I/O Device Runtime PM
Device Runtime PM Operations
Suspended
Active
Is Idle?YES
NO
SuspendResume
Rafael J. Wysocki (Intel OTC) PM in the Linux Kernel October 4, 2016 16 / 32
Working-State Power Management I/O Device Runtime PM
Runtime PM on Future Systems
Hardware design trends
Increasing integration of components.
Increasing level of support for aggressive PM.
System-on-a-Chip (SoC) configurations
CPU packages contain I/O devices.
Package low-power states depend on I/O devices.
PM features of different components are interdependent.
Challenge: Take dependencies into account
Rafael J. Wysocki (Intel OTC) PM in the Linux Kernel October 4, 2016 17 / 32
Working-State Power Management CPU Power Management
CPU Power Management Frameworks
Working
cpufreqPM QoS cpuidle
Sleeping
Hibernation
devfreq
SystemSuspend
Runtime PM
DeviceSuspend
Rafael J. Wysocki (Intel OTC) PM in the Linux Kernel October 4, 2016 18 / 32
Working-State Power Management CPU Power Management
CPU Power Management Overview
CPUFreq
PerformanceStates
CPUIdle
Idle States
CPU Scheduler
Rafael J. Wysocki (Intel OTC) PM in the Linux Kernel October 4, 2016 19 / 32
Working-State Power Management CPU Power Management
Cooperation Between Components Is Key
CPUFreq
PerformanceStates
CPUIdle
Idle States
CPU Scheduler
Linux PM
CPUFreq
PerformanceStates
CPUIdle
Idle States
CPU Scheduler
Rafael J. Wysocki (Intel OTC) PM in the Linux Kernel October 4, 2016 20 / 32
Working-State Power Management CPU Power Management
CPUIdle Workflow
CPUIdle
Idle States
CPU SchedulerIdle CPU
Wakeup
Timekeeping
Next timer event
Timekeeping
Next timer event
Timekeeping
Next timer event
PM QoS
Latency tolerance
Rafael J. Wysocki (Intel OTC) PM in the Linux Kernel October 4, 2016 21 / 32
Working-State Power Management CPU Power Management
CPUFreq: Old-Style Governors and intel pstate
CPU Scheduler
Adjust performance
Estimate utilization
Governor callback
Adjust performance(Driver)
Estimate utilization
Queue up work
Timekeeping
Idle/Total Time Ratio
Governor callback
intel_pstate Generic Governors
CPU Scheduler
Rafael J. Wysocki (Intel OTC) PM in the Linux Kernel October 4, 2016 22 / 32
Working-State Power Management CPU Power Management
CPUFreq: schedutil Governor
Adjust performance(Driver)
Compute frequency
Governor callback
Fast switchingsupported?
YES
Queue up workNO
Adjust performance(Driver)
CPU Scheduler
Rafael J. Wysocki (Intel OTC) PM in the Linux Kernel October 4, 2016 23 / 32
Working-State Power Management CPU Power Management
CPUFreq: Scheduler Hints, Cross-CPU Updates
Observations
It is good to take blocking on I/O (IOwait) into account.
Different scheduling classes require different handling.
Solution: Scheduler hints (work in progress)
Pass hints from the CPU scheduler to governor callbacks.
A hint can represent the reason update or similar.
Observation
In some cases it is beneficial to invoke governor callbacks cross-CPU.
Rafael J. Wysocki (Intel OTC) PM in the Linux Kernel October 4, 2016 24 / 32
Working-State Power Management CPU Power Management
The Future of CPU Power Management
Complex topologies (hardware threads, modules, packages)
Many different scheduling strategies are potentially viable.
PM-Aware Scheduling Conjecture
Energy efficiency may be improved without hurting performance by makingthe CPU scheduler drive CPU power management as a whole.
Predictions (usual disclaimers apply)
PM-aware scheduling will enter the mainline kernel.
CPUFreq and CPUIdle will be combined.
Rafael J. Wysocki (Intel OTC) PM in the Linux Kernel October 4, 2016 25 / 32
Conclusion
The Linux kernel supports power management in a number of ways.
Both system-wide and working state (runtime) PM are supported.
Support for system-wide PM in device drivers is generally better.
I/O device runtime PM support is improving.
CPU PM is well supported, consolidation in progress.
Hardware design trends increase PM complexity.
Interdependencies between PM features are challenging.
Questions?
Rafael J. Wysocki (Intel OTC) PM in the Linux Kernel October 4, 2016 26 / 32
Resources
References
R. J. Wysocki, CPUfreq and The Scheduler: Revolution in CPU Power Management
(http://events.linuxfoundation.org/sites/events/files/slides/cpufreq_and_scheduler_0.pdf).
Neil Brown, Improvements in CPU frequency management (http://lwn.net/Articles/682391/).
Len Brown, Suspend/Resume at the Speed of Light (http://events.linuxfoundation.org/sites/events/files/
slides/Brown-Linux-Suspend-at-Speed-of-Light-LC-EU-2015.pdf).
R. J. Wysocki, Getting More Out Of System Suspend In Linux
(http://events.linuxfoundation.org/sites/events/files/slides/linux_suspend.pdf).
R. J. Wysocki, Power Management in the Linux Kernel – Current Status and Future
(http://events.linuxfoundation.org/sites/events/files/slides/kernel_PM_plain.pdf).
Jonathan Corbet, The cpuidle subsystem (http://lwn.net/Articles/384146/).
R. J. Wysocki, Why We Need More Device Power Management Callbacks
(https://events.linuxfoundation.org/images/stories/pdf/lfcs2012_wysocki.pdf).
Rafael J. Wysocki (Intel OTC) PM in the Linux Kernel October 4, 2016 27 / 32
Resources
Documentation and Source Code
Documentation/cpu-freq/*
Documentation/cpuidle/*
Documentation/power/devices.txt
Documentation/power/pci.txt
Documentation/power/runtime pm.txt
include/linux/cpufreq.h
include/linux/cpuidle.h
include/linux/device.h
include/linux/pm.h
include/linux/pm runtime.h
include/linux/suspend.h
drivers/acpi/processor idle.c
drivers/base/power/*
drivers/cpufreq/*
drivers/cpuidle/*
kernel/power/*
Rafael J. Wysocki (Intel OTC) PM in the Linux Kernel October 4, 2016 28 / 32
Legal Information
Intel is a trademark of Intel Corporation in the U. S. and other countries.*Other names and brands may be claimed as the property of others.Copyright c© 2016 Intel Corporation, All rights reserved.
Rafael J. Wysocki (Intel OTC) PM in the Linux Kernel October 4, 2016 29 / 32
Thanks!
Thank you for attention!
Rafael J. Wysocki (Intel OTC) PM in the Linux Kernel October 4, 2016 30 / 32
How System-Wide PM Is Used
Laptop/Handheld Usage Scenarios
Time
Power
Resu
me
Su
sp
en
d
Ru
nti
me A
cti
ve
Display
Ru
nti
me A
cti
ve
Display
Runtime Idle
Display
Wake SleepInterrupt
Time
Resu
me
Su
sp
en
d
Ru
nti
me A
cti
ve
Dis
pla
y
Runtime Idle
Display
Wake Sleep
Resu
me
Su
sp
en
d
Ru
nti
me A
cti
ve
Dis
pla
y
Runtime Idle
Display
Wake Sleep
Laptop Handheld
Rafael J. Wysocki (Intel OTC) PM in the Linux Kernel October 4, 2016 31 / 32
How System-Wide PM Is Used
Dark Resume Scenario
Time
Power
Resu
me
Su
sp
en
d
Ru
nti
me A
cti
ve
Dis
pla
y
Runtime Idle
Display
Wake Sleep
Resu
me
Su
sp
en
d
Ru
nti
me A
cti
ve
Wake Sleep
Resu
me
Su
sp
en
d
Ru
nti
me A
cti
ve
Rafael J. Wysocki (Intel OTC) PM in the Linux Kernel October 4, 2016 32 / 32