xpds13: performance optimization on xen-based android device - jack ren, intel and xiantao zhang,...

16
Performance Optimization on Xen- based Android device Jack Ren/Xiantao Zhang/Dongxiao Xu Key contributor: Eddie Dong Intel Corporation

Upload: the-linux-foundation

Post on 11-Jun-2015

1.595 views

Category:

Technology


2 download

DESCRIPTION

Mobile devices, such as smart phones and tablets, are becoming de-facto everyday computing and communication devices, virtualization can bring additional benfits to mobile devices for both security and manageability. IT department may use hypervisor, as a highly secure solution, to manage autherized mobile devices, such as for network traffic monitoring, filtering, scan (for virus detection), and/or OS update/patching even when the guest OS becomes completely dead. We insert Xen to the mobile OS Android to deprivilege Android as guest for security and manageability purpose. However, the usage case of mobile device is quit different with that of server, for example mobile devices runs completely different benchmarks (mostly multimedia focused) vs. that in server (mostly responsiveness focused). We analyze the gap of Xen as a mobile hypervisor and present how we improve the performance.

TRANSCRIPT

Page 1: XPDS13: Performance Optimization on Xen-based Android Device - Jack Ren, Intel and Xiantao Zhang, Intel

Performance Optimization on Xen-

based Android device

Jack Ren/Xiantao Zhang/Dongxiao Xu

Key contributor: Eddie Dong

Intel Corporation

Page 2: XPDS13: Performance Optimization on Xen-based Android Device - Jack Ren, Intel and Xiantao Zhang, Intel

Legal Disclaimer

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS.

Intel may make changes to specifications and product descriptions at any time, without notice.

All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.

Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

*Other names and brands may be claimed as the property of others.

Copyright © 2012 Intel Corporation.

Page 3: XPDS13: Performance Optimization on Xen-based Android Device - Jack Ren, Intel and Xiantao Zhang, Intel

Agenda

• Overview

• Design Details

• Gaps, Analysis & Optimizations

• Summary

3

Page 4: XPDS13: Performance Optimization on Xen-based Android Device - Jack Ren, Intel and Xiantao Zhang, Intel

Overview• Back to Xen Summit 2011 in Seoul…

“Mobile virtualization will be more important…Xen has unique advantages there”

- <<Mobile Virtualization using the Xen Technoligies>>, Jun Nakajima, Intel.

And Jun proposed xen-based Android system:

Page 5: XPDS13: Performance Optimization on Xen-based Android Device - Jack Ren, Intel and Xiantao Zhang, Intel

Overview continue• New use case: Android in Dom0, hypervisor as TEE

Dom0

Android userland (ring 3)

Android framework

Android Kernel

(ring 1)

Surface Manager

OpenGLES

Dalvik

Xen(ring 0)Virtual CPU

GFX

Video

PM

Virtual MMU

Virtual IRQ

Gallery VideoPlayer Browser …

But we don’t want to sacrifice performance and power too much

TEE:

Trusted Execution Engine

Page 6: XPDS13: Performance Optimization on Xen-based Android Device - Jack Ren, Intel and Xiantao Zhang, Intel

Design Details

− For example, Quadrant I/O: 21% downgrade

Virtualization performance

I/O pass-through to Android close to native performance

CPU vCPUs pinned to physical CPUs

Eliminate the vCPU scheduling penalty

MMU Para-virtualized Good run time performance

IRQ Xen owns, dispatch toAndroid via event channel

Main overhead: ring switch

FPU Para-virtualized No vCPU scheduling, very good performance

CpuIdle Pass-through to Android Completely consistent with Android PM

CpuFreq Pass-through to Android Same as above

Standby (S3) Pass-through to Android Same as above

• Android runs almost natively

Standby (S3) is a little bit tricky…

Page 7: XPDS13: Performance Optimization on Xen-based Android Device - Jack Ren, Intel and Xiantao Zhang, Intel

Design Details continue

Re-design S3

• Dom0 owns the full suspend/resume logic.

• Xen assists Dom0 to issue the real monitor/mwait.

• 2X faster than native for S3 resume.

CPU0

CPU1

CPU2

CPU3

HYPERVISOR_

vcpu_op(VCPUOP_down)

do_mwait_suspend()

mwait

HYPERVISOR_

do_mwait_suspend()do_mwait_suspend()

sleep

mwait

mwait

wake up CPU0

CPU1

CPU2

CPU3

Time line

HYPERVISOR_

vcpu_op(VCPUOP_up)

Page 8: XPDS13: Performance Optimization on Xen-based Android Device - Jack Ren, Intel and Xiantao Zhang, Intel

Preliminary Power (normalized)• > 90% of benchmarks reach 95% of native power

80%

85%

90%

95%

100%

105%

Power KPIs

But we still identified several gaps…

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests,

such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any

change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully

evaluating your contemplated purchases, including the performance of that product when combined with other products.

Configurations: [describe config + what test used + who did testing]. For more information go to http://www.intel.com/performance

Page 9: XPDS13: Performance Optimization on Xen-based Android Device - Jack Ren, Intel and Xiantao Zhang, Intel

Preliminary Performance (normalized)

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

120.00%EEM

BC

Core

Mark

Dhry

sto

ne -

BEN

C

Caffein

eM

ark

iSPEC00 -

speed

Mic

ro B

enchm

ark

Mic

ro B

enchm

ark

AnTutu

2.9

.4 C

PU

Int

Sunspid

er

EEM

C B

row

ingBench

Bro

wserm

ark

Octa

ne

Fis

hIE

Tank -

200M

BaseM

ark

ES2v1 T

aiji

BaseM

ark

ES2v1…

Sm

ark

Bench2012

Qudra

nt2

D

Qudra

nt3

D

Qudra

nt

IO

GLBenchm

ark

2.5

.1…

GLBenchm

ark

2.5

.1…

Cold

Boot

tim

e t

o…

H.2

64/M

PEG

-4 A

VC…

H.2

64 v

ideo r

ecord

3G

HSD

PA d

ow

nlo

ad

WLAN

dow

nlo

ad

CF-B

ench (

malloc)

USB M

TP e

rad larg

e…

USB M

TP w

rite

Performance KPIs

But we still identified several gaps and need some tools to help us…

•> 90% of benchmarks reach 97% of native performance

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests,

such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any

change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully

evaluating your contemplated purchases, including the performance of that product when combined with other products.

Configurations: [describe config + what test used + who did testing]. For more information go to http://www.intel.com/performance

Page 10: XPDS13: Performance Optimization on Xen-based Android Device - Jack Ren, Intel and Xiantao Zhang, Intel

Tools EnablingEnabled a lot of tools for performance tuning

• vTune

− Based on PMU, mainly used to tune Dom0

• Xentrace

− Based on original Xentrace, but revised to count key events and hypercalls

• Perf

− Based on PMU, mainly used to tune Dom0

• Xenoprofier

− Based on PMU, mainly used to tune Xen

Those tools prove very helpful in the late tuning Performance and power

Page 11: XPDS13: Performance Optimization on Xen-based Android Device - Jack Ren, Intel and Xiantao Zhang, Intel

Case #1: Quadrant I/O (perf)

Gap: 21%

• Analysis:

Storage data are cached in page cache which is allocated from

high_memory. Each page cache access needs to kmap/kunmap which

leads to a lot of PVMMU hypercalls

• Optimizations:

− Shrink Xen memory foot print from 168M to 72M

− Force page cache allocated from low memory

• Gap reduced to 8.5%

Can we continue to optimize and close that gap of 8.5%?

Page 12: XPDS13: Performance Optimization on Xen-based Android Device - Jack Ren, Intel and Xiantao Zhang, Intel

Case #1: Quadrant I/O (perf) continue

Profiled by Vtune

Among 8.5%:Xen overhead =

134/3138 ~= 4.27%

Xen traces

Among 4.27%:PVMMU overhead ~= 70.88%

Hard to further close the gap of 8.5% due to PVMMU overhead

type name count cost cost%

hcall mwait_idle_op 3759 37142118744

hcall multicall 12147 145492506 32.12%

hcall mmu_update 27126 113270256 25.00%

hcall mmuext_op 7781 50615724 11.17%

hcall vcpu_op 6577 39658986 8.75%

hcall event_channel_op 3405 26617650 5.88%

hcall xen_version 4937 12374700 2.73%

event PAGE_FAULT 9764 11719224 2.59%

event IRQ 1119 10178934 2.25%

hcall event_channel_op 1259 9081834 2.00%

hcall physdev_op 1692 8251512 1.82%

hcall event_channel_op 840 7024398 1.55%

hcall event_channel_op 761 6150300 1.36%

event TIMER_IRQ 472 5745738 1.27%

hcall event_channel_op 545 4361118 0.96%

event TRAP 1038 1040916 0.23%

event PRIVOP 1032 872700 0.19%

hcall fpu_taskswitch 1038 439638 0.10%

hcall undfined 21 102672 0.02%

hcall apic_op 3 5484 0.00%

total cost: 453004290

Page 13: XPDS13: Performance Optimization on Xen-based Android Device - Jack Ren, Intel and Xiantao Zhang, Intel

Case #2: Home Screen Scroll (power)Gap: 1.2% gap

Profiled by Vtune

Xen overhead = 30/3176 ~= 1%

type Name count cost cost%

event IRQ, 1843 18323532 7.040037304

event TRAP, 88 131352 0.050466416

event PAGE_FAULT, 943 3237852 1.244006825

event PRIVOP, 1385 533748 0.205069952

event TIMER_IRQ, 144 2062704 0.792506221

hcall mmu_update, 990 8866296 3.40649688

hcall fpu_taskswitch, 95 66816 0.025671204

hcall multicall, 8736 109199952 41.9554339

hcall xen_version, 3914 10860348 4.172626492

hcall vcpu_op, 9694 55009236 21.13495769

hcall mmuext_op, 3858 34409052 13.22021375

hcall event_channel_op, 1188 10105920 3.882769643

hcall physdev_op, 1078 7469256 2.869743719

hcall mwait_idle_op, 3938 23493503868

total cost 260276064

cost of PAGE_FAULT, mmu_update, multicall, mmuext_op 155713152 59.82615136

Xen tracesAmong 1%:

PVMMU overhead = 59.83%

PVMMU overhead again…

Page 14: XPDS13: Performance Optimization on Xen-based Android Device - Jack Ren, Intel and Xiantao Zhang, Intel

Other Gaps

Other cases have the similar Xen overheads:

• PVMMU

• TLS/stack switching

Some cases could be optimized by reducing the hypercall

numbers by optimizing guest

• For example, Quadrant I/O

While, some cases could be hard to optimize due to PV overhead

• For example, CF-Bench malloc

Could be fixed by HVM Dom0

Page 15: XPDS13: Performance Optimization on Xen-based Android Device - Jack Ren, Intel and Xiantao Zhang, Intel

Summary

• Dom0 Android achieved near-native power and performance

• Still found some power and performance gaps caused by PVOPS

− PVMMU

− TLS/Stack switch

• Those gaps could be fixed by HVM Dom0

Page 16: XPDS13: Performance Optimization on Xen-based Android Device - Jack Ren, Intel and Xiantao Zhang, Intel

Q & A

• Questions?

• or contact

Jack Ren <[email protected]>

Xiantao Zhang <[email protected]>