how to get realistic c-states latency and...
TRANSCRIPT
![Page 1: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster](https://reader033.vdocument.in/reader033/viewer/2022041623/5e40a760051aee2b9c2cd495/html5/thumbnails/1.jpg)
How to get realistic C-states latency and residency ?
Vincent Guittot
![Page 2: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster](https://reader033.vdocument.in/reader033/viewer/2022041623/5e40a760051aee2b9c2cd495/html5/thumbnails/2.jpg)
Agenda● Overview● Exit latency● Enter latency● Residency● Conclusion
![Page 3: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster](https://reader033.vdocument.in/reader033/viewer/2022041623/5e40a760051aee2b9c2cd495/html5/thumbnails/3.jpg)
Overview
![Page 4: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster](https://reader033.vdocument.in/reader033/viewer/2022041623/5e40a760051aee2b9c2cd495/html5/thumbnails/4.jpg)
Overview● PMWG uses hikey960 for testing our dev on b/L system
○ Cluster off and residency values in DT binding were looking really high:
● Decided to find a way to check the correctness of the figures● How to easily get realistic figures for the C-states table of my platform ?
○ Without expensive materials○ Without deep knowledges in power management and idle states○ Define values for a platform or check current values
Entry latency (us) Exit Latency (us) Residency time (us)
CPU off (big and LITTLE) 40 70 3000
LITTLE cluster off 500 5000 20000
Big cluster off 1000 5000 20000
![Page 5: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster](https://reader033.vdocument.in/reader033/viewer/2022041623/5e40a760051aee2b9c2cd495/html5/thumbnails/5.jpg)
C-state latency● Prepare :
○ Cache maintenance○ Abortable
● Entry :○ HW & SW sequence to enter idle step○ Not abortable
● Exit :○ HW & SW sequences needed to bring back CPU to running state
* Read Documentation/devicetree/bindings/arm/idle-states.txt for details
Exec
Pre
pare
EntryIdle
Exit Exec
![Page 6: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster](https://reader033.vdocument.in/reader033/viewer/2022041623/5e40a760051aee2b9c2cd495/html5/thumbnails/6.jpg)
How to measure latency ?● Trigger contentions
○ Compete for accessing critical resources○ Look for worst values
● Trigger slowest path○ Cache flush for entering latency
![Page 7: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster](https://reader033.vdocument.in/reader033/viewer/2022041623/5e40a760051aee2b9c2cd495/html5/thumbnails/7.jpg)
Test environment● CPU isolation
○ Isolate CPUs from external noise and background activity○ Works great for big cluster○ Not enough for little cluster
■ Boot CPU in little cluster
■ Interruptions pinned to CPU0
■ “Lot” of spurious activity pinned on little cluster
● Use rt-app○ Sync wake up of CPUs○ Range of wake up periods○ Log events and phases duration
● Hikey960○ Modified for accessing VDD_4V2 voltage domain
● Arm Energy Probe USB dongle
![Page 8: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster](https://reader033.vdocument.in/reader033/viewer/2022041623/5e40a760051aee2b9c2cd495/html5/thumbnails/8.jpg)
Exit latency
![Page 9: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster](https://reader033.vdocument.in/reader033/viewer/2022041623/5e40a760051aee2b9c2cd495/html5/thumbnails/9.jpg)
1st test: exit latency● Enable only 1 state to force cpuidle
○ Not fully robust
● Wake up CPUs simultaneously
● rt-app logs wake up latency○ Get min, max, average and std-dev
Timer IRQ Read clock
CPU0
CPU1
CPU2
CPU3
![Page 10: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster](https://reader033.vdocument.in/reader033/viewer/2022041623/5e40a760051aee2b9c2cd495/html5/thumbnails/10.jpg)
1st test: exit latency
Max
Min
95%
@903Mhz @2362Mhz
![Page 11: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster](https://reader033.vdocument.in/reader033/viewer/2022041623/5e40a760051aee2b9c2cd495/html5/thumbnails/11.jpg)
1st test: exit latency
@903Mhz @2362Mhz
![Page 12: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster](https://reader033.vdocument.in/reader033/viewer/2022041623/5e40a760051aee2b9c2cd495/html5/thumbnails/12.jpg)
1st test: exit latency● One CPU wakes up faster than others
○ Most probably the one that gets one “lock” first
● Frequency of other cluster impacts exit latency○ Flatten the difference between min and max OPP○ +400us for max OPP when other cluster runs at lowest OPP
● Local frequency has a limited impact at the end○ Around 200us on the 2900us budget
● Sync wake up with other cluster has a limited impacts latency○ Few dozen of us
● Firmware mode has an impact○ Release vs debug mode
![Page 13: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster](https://reader033.vdocument.in/reader033/viewer/2022041623/5e40a760051aee2b9c2cd495/html5/thumbnails/13.jpg)
All latencies● Big cluster off slower than LITTLE cluster off
○ Most probably more thing are shut down compared to little■ Like powering down power domain
● Measured latency includes full wake up path1. timer interrupt fires (at almost the programmed timestamp as the granularity of the timer is 52ns)2. PM coprocessor HW wakes up sequence (when involved)3. ATF firmware resume sequence (when involved)4. cpuidle driver5. cpuidle framework6. Idle thread including starting/stopping tick nohz idle7. Switching to rt-app thread8. Read time clock
big cluster little cluster
CLUSTER CPU WFI CLUSTER CPU WFI
exit 2900 550 70 1600 650 100
![Page 14: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster](https://reader033.vdocument.in/reader033/viewer/2022041623/5e40a760051aee2b9c2cd495/html5/thumbnails/14.jpg)
Entry latency
![Page 15: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster](https://reader033.vdocument.in/reader033/viewer/2022041623/5e40a760051aee2b9c2cd495/html5/thumbnails/15.jpg)
2nd test: entry latency● Enable only 1 state to force cpuidle
○ Not fully robust
● rt-app logs phases duration○ Get min, max, average and std-dev
● Increase the sleep duration step by step○ Phase duration increase @ entry latency
Timer IRQ
CPU0
Timer IRQ
CPU0
Timer IRQ
CPU0
Timer IRQ
CPU0
Timer IRQ
CPU0
![Page 16: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster](https://reader033.vdocument.in/reader033/viewer/2022041623/5e40a760051aee2b9c2cd495/html5/thumbnails/16.jpg)
2nd test: entry latency (single cpu)
sleep duration becomes longer
than entry latency
Spurious wake up that can be discarded
![Page 17: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster](https://reader033.vdocument.in/reader033/viewer/2022041623/5e40a760051aee2b9c2cd495/html5/thumbnails/17.jpg)
2nd test: entry latency (multi cpu)
sleep duration becomes longer than wake up latency
1st abort point
![Page 18: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster](https://reader033.vdocument.in/reader033/viewer/2022041623/5e40a760051aee2b9c2cd495/html5/thumbnails/18.jpg)
2nd test: entry latency● Wake up duration includes
○ rt-app task events○ Entry latency○ Extra sleep time○ Exit latency
● Steps in charts○ Show the different abortable points
big cluster little cluster
CLUSTER CPU WFI CLUSTER CPU WFI
entry 900 400 ~0 500 400 ~0
![Page 19: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster](https://reader033.vdocument.in/reader033/viewer/2022041623/5e40a760051aee2b9c2cd495/html5/thumbnails/19.jpg)
All latencies
big cluster little cluster
CLUSTER CPU WFI CLUSTER CPU WFI
entry 800 400 ~0 500 400 0
exit 2900 550 70 1600 650 100
wake up 3700 950 70 2100 1050 100
![Page 20: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster](https://reader033.vdocument.in/reader033/viewer/2022041623/5e40a760051aee2b9c2cd495/html5/thumbnails/20.jpg)
Residency time
![Page 21: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster](https://reader033.vdocument.in/reader033/viewer/2022041623/5e40a760051aee2b9c2cd495/html5/thumbnails/21.jpg)
● Residency time○ Minimum idle time above which
it’s worth selecting the C-state
● Estimated idle duration○ Select longest residency time
● Wakeup latency○ Skip some C-states
C-state residency
Exec
Pre
pare
Idle
Exec
ExecIdle
Exec
ExecEntry
Idle
Exit Exec
Pre
pare
![Page 22: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster](https://reader033.vdocument.in/reader033/viewer/2022041623/5e40a760051aee2b9c2cd495/html5/thumbnails/22.jpg)
How to estimate residency time ?● Measure precisely each step independently
○ Energy consumed during each step of each state○ Isolate CPUs power domain from others
● Imply○ Having access to all power domains○ Having very precise power meters (some steps are short, transient and difficult to measure)
● Don’t really care of absolute value○ Just want to compare idle states to each others
● Don’t really care about power impact of each step○ Only interested by end results
![Page 23: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster](https://reader033.vdocument.in/reader033/viewer/2022041623/5e40a760051aee2b9c2cd495/html5/thumbnails/23.jpg)
How to estimate residency time ?● Wake up periodically the CPU and measures power consumption
○ Task don’t do anything else than wake up and sleep■ Power impact is mainly entry/exit sequence
○ With decreasing periods, entry and exit steps take more and more importance
○ Run the same number of wakeup/sleep sequence■ Thousands of times■ Relax power meters precision constraint
○ Don’t need to have access to dedicated power domain■ Only interested in difference■ Side and noise power consumption will be removed as long as stable across tests
![Page 24: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster](https://reader033.vdocument.in/reader033/viewer/2022041623/5e40a760051aee2b9c2cd495/html5/thumbnails/24.jpg)
How to estimate residency time ?● Use rt-app to generate periodic wake up
○ Task don’t do anything else than wake up and sleep○ Run thread with a decreasing period
■ 10ms down to 1ms with a step of 0.5ms has been used for hikey960
● Minimize impact of background activity of other cluster(s)○ Enable only WFI○ use lowest OPP
● Run long enough (20 seconds) and several times (x8)○ Filter background activity of the system○ Keep iteration with min value○ Test is really long : more than 3 days of continuous tests for hikey960
![Page 25: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster](https://reader033.vdocument.in/reader033/viewer/2022041623/5e40a760051aee2b9c2cd495/html5/thumbnails/25.jpg)
3rd test: residency time
Wake up latencyfor cluster off
Break even point between cluster
and cpu off
Break even point between cpu off
and WFI
![Page 26: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster](https://reader033.vdocument.in/reader033/viewer/2022041623/5e40a760051aee2b9c2cd495/html5/thumbnails/26.jpg)
3rd test: residency time
Wake up latencyfor cluster off
Break even point between cpu off
and WFI
![Page 27: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster](https://reader033.vdocument.in/reader033/viewer/2022041623/5e40a760051aee2b9c2cd495/html5/thumbnails/27.jpg)
Residency
Big cluster Little cluster
CLUSTER CPU WFI CLUSTER CPU WFI
Lowest OPP
5000 1500 N/A 8000 4500 N/A
Highest OPP
0 1500 N/A 0 1500 N/A
![Page 28: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster](https://reader033.vdocument.in/reader033/viewer/2022041623/5e40a760051aee2b9c2cd495/html5/thumbnails/28.jpg)
3rd test: residency time● Residency time differs widely with OPP● Understandable when we looks the “static” power consumption
○ big core @ lowest OPP: cluster off is 8% < WFI (absolute value)○ big core @ highest OPP: cluster off is 25% < WFI (absolute value)○ Need to weight residency time value of each OPP with % saved
● New residency value means increase the usage on cluster off state○ Can see some responsiveness increases○ 20ms residency time for cluster off versus 16ms for display sync event○ Use CPU latency constraint instead: per CPU or system wide
![Page 29: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster](https://reader033.vdocument.in/reader033/viewer/2022041623/5e40a760051aee2b9c2cd495/html5/thumbnails/29.jpg)
Conclusion
![Page 30: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster](https://reader033.vdocument.in/reader033/viewer/2022041623/5e40a760051aee2b9c2cd495/html5/thumbnails/30.jpg)
Conclusion● More rt-app test cases can be used:
○ With memory event as an example■ Not real difference has been shown
● OPP has a significant impact on residency time
● Scripts will be publicly available soon○ Run tests and gather results
● Next step○ Automate charts creation○ Automate entry, exit, and residency values extraction
![Page 31: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster](https://reader033.vdocument.in/reader033/viewer/2022041623/5e40a760051aee2b9c2cd495/html5/thumbnails/31.jpg)
Thank You
#HKG18HKG18 keynotes and videos on: connect.linaro.orgFor further information: www.linaro.org