1 analyzing latency of i/o events - schedschd.ws/hosted_files/devconfcz2016/3a/archit sharma...

17
1 ANALYZING LATENCY ANALYZING LATENCY OF OF I/O EVENTS I/O EVENTS ARCHIT SHARMA ARCHIT SHARMA ASSOCIATE PERFORMANCE ENGINEER ASSOCIATE PERFORMANCE ENGINEER BLR | Red Hat India Pvt. Ltd.

Upload: phamnhu

Post on 26-May-2018

233 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 ANALYZING LATENCY OF I/O EVENTS - Schedschd.ws/hosted_files/devconfcz2016/3a/Archit Sharma Devconf_16... · 1 ANALYZING LATENCY OF I/O EVENTS ... $ pip install perf-script-postprocessor

1

ANALYZING LATENCYANALYZING LATENCYOFOF

I/O EVENTSI/O EVENTS

ARC H I T S HAR MAARC H I T S HAR MAASSOCIATE PERFORMANCE ENGINEERASSOCIATE PERFORMANCE ENGINEER

BLR | Red Hat India Pvt. Ltd.

Page 2: 1 ANALYZING LATENCY OF I/O EVENTS - Schedschd.ws/hosted_files/devconfcz2016/3a/Archit Sharma Devconf_16... · 1 ANALYZING LATENCY OF I/O EVENTS ... $ pip install perf-script-postprocessor

2

An I/O use case

The investigation:

Block I/O events

native vs. threads in Qemu-KVM

IOPS performance benchmarking/debugging

General approaches

Tools/utilities we've rolled out:

includes benchmarking IOPS

postprocessing that data

Applicability of Latency analysis

THINGS WE'RE GONNATHINGS WE'RE GONNATALK ABOUTTALK ABOUT

Page 3: 1 ANALYZING LATENCY OF I/O EVENTS - Schedschd.ws/hosted_files/devconfcz2016/3a/Archit Sharma Devconf_16... · 1 ANALYZING LATENCY OF I/O EVENTS ... $ pip install perf-script-postprocessor

3

Whether the delay is being produced by filesystem /kvm layer?

IO engines: How does async compare to sync ?How does a setup with target:threads compare toone with target:native for a kernel version?

Would I achieve better results if I changed iodepth?

Block I/O and File I/O

USE CASEUSE CASE I/O EVENTS IN QEMU-KVM I/O EVENTS IN QEMU-KVM

Page 4: 1 ANALYZING LATENCY OF I/O EVENTS - Schedschd.ws/hosted_files/devconfcz2016/3a/Archit Sharma Devconf_16... · 1 ANALYZING LATENCY OF I/O EVENTS ... $ pip install perf-script-postprocessor

4

[Native]

kvm_exit -> sys_exit_ppoll -> sys_enter_io_submit -> sys_exit_io_submit .... -> sys_enter_io_getevents -> sys_exit_io_getevents

BLOCK I/O EVENTS IN QEMU-BLOCK I/O EVENTS IN QEMU-KVMKVM

An investigation of blockIO events: tracing and analyzing them

Came up with a couple of utilities to help analyze I/O latency..

Page 5: 1 ANALYZING LATENCY OF I/O EVENTS - Schedschd.ws/hosted_files/devconfcz2016/3a/Archit Sharma Devconf_16... · 1 ANALYZING LATENCY OF I/O EVENTS ... $ pip install perf-script-postprocessor

5

GENERAL APPROACHESGENERAL APPROACHES

IOPS Benchmarking -

Our addon:

Debugging: Widely used

Our addon: I/O Event

FIO

pbench_fio

perf-tools

loop latency processor

IOPS PERFORMANCE BENCHMARKING/DEBUGGINGIOPS PERFORMANCE BENCHMARKING/DEBUGGING

Page 6: 1 ANALYZING LATENCY OF I/O EVENTS - Schedschd.ws/hosted_files/devconfcz2016/3a/Archit Sharma Devconf_16... · 1 ANALYZING LATENCY OF I/O EVENTS ... $ pip install perf-script-postprocessor

6

20032003

20012001 20092009 20162016

20152015

LINUX PERF ANALYSIS TOOLS TIMELINELINUX PERF ANALYSIS TOOLS TIMELINE

pbench

---------------

perf-scriptpostprocessor

Page 7: 1 ANALYZING LATENCY OF I/O EVENTS - Schedschd.ws/hosted_files/devconfcz2016/3a/Archit Sharma Devconf_16... · 1 ANALYZING LATENCY OF I/O EVENTS ... $ pip install perf-script-postprocessor

7

PBENCHPBENCH

http://distributed-system-analysis.github.io/pbench/

A Benchmarking and Performance Analysis Framework

Allows commonly used / even custom benchmarkingscripts!

Dynamic visualizations enabling hands-on explorationand deeper insights into potential bottleneck regions

Easy to use and setup

Exciting upcoming features..

Open for contributions!

Page 8: 1 ANALYZING LATENCY OF I/O EVENTS - Schedschd.ws/hosted_files/devconfcz2016/3a/Archit Sharma Devconf_16... · 1 ANALYZING LATENCY OF I/O EVENTS ... $ pip install perf-script-postprocessor

8

PBENCHPBENCH

http://distributed-system-analysis.github.io/pbench/

A Benchmarking and Performance Analysis Framework

1 A collection agent (pbench-agent) -> Handles TLC- Telemetry, Logs and Configurations

2 Background tasks (bgtasks) -> Archives result tarballs, indexes them, and unpacks them for display.

3 Web server -> display various graphs and results

Page 9: 1 ANALYZING LATENCY OF I/O EVENTS - Schedschd.ws/hosted_files/devconfcz2016/3a/Archit Sharma Devconf_16... · 1 ANALYZING LATENCY OF I/O EVENTS ... $ pip install perf-script-postprocessor

9

PBENCHPBENCH

http://distributed-system-analysis.github.io/pbench/

A Benchmarking and Performance Analysis Framework

Page 10: 1 ANALYZING LATENCY OF I/O EVENTS - Schedschd.ws/hosted_files/devconfcz2016/3a/Archit Sharma Devconf_16... · 1 ANALYZING LATENCY OF I/O EVENTS ... $ pip install perf-script-postprocessor

10

Hands-on tracing with flexible approach

specify your own event loops!

Lots of use cases - disk I/O, network I/O, ..

A statistical, descriptive and visual approach tolatency analysis

Available on pypi!

$ pip install perf-script-postprocessor

PERF SCRIPT POSTPROCESSORPERF SCRIPT POSTPROCESSOR

A DEBUGGING TOOLA DEBUGGING TOOLGithub: arcolife/perf-script-postprocessor

Page 11: 1 ANALYZING LATENCY OF I/O EVENTS - Schedschd.ws/hosted_files/devconfcz2016/3a/Archit Sharma Devconf_16... · 1 ANALYZING LATENCY OF I/O EVENTS ... $ pip install perf-script-postprocessor

11

PERF SCRIPT POSTPROCESSORPERF SCRIPT POSTPROCESSOR

A DEBUGGING TOOLA DEBUGGING TOOL

(PERF TOOLS) - $ PERF KVM RECORD(PERF TOOLS) - $ PERF KVM RECORD

GENERATES BINARY DATA FILEGENERATES BINARY DATA FILE

PERF.DATAPERF.DATA

$ PERF_SCRIPT_PROCESSOR$ PERF_SCRIPT_PROCESSOR

{MEAN, MEDIAN, STD_DEVIATION}{MEAN, MEDIAN, STD_DEVIATION}EVENT LOOP LATENCIESEVENT LOOP LATENCIES

Page 12: 1 ANALYZING LATENCY OF I/O EVENTS - Schedschd.ws/hosted_files/devconfcz2016/3a/Archit Sharma Devconf_16... · 1 ANALYZING LATENCY OF I/O EVENTS ... $ pip install perf-script-postprocessor

12

PERF SCRIPT POSTPROCESSORPERF SCRIPT POSTPROCESSOR

A DEBUGGING TOOLA DEBUGGING TOOL

Page 13: 1 ANALYZING LATENCY OF I/O EVENTS - Schedschd.ws/hosted_files/devconfcz2016/3a/Archit Sharma Devconf_16... · 1 ANALYZING LATENCY OF I/O EVENTS ... $ pip install perf-script-postprocessor

13

ADDITIONAL UTILSADDITIONAL UTILSKVM_IO - BENCH_ITER.SHKVM_IO - BENCH_ITER.SH

[root@perf results]# ls1/ 2/ 3/ 4/ 5/ perf_record_.txt perf_kvm_record_.txtperf_trace_.txt strace_.txt

[root@perf results]# ls 1/output_perf_trace output_strace perf_record.data perf_kvm_record.dataresults_1_perf_record_ results_1_perf_trace_ results_1_perf_trace_record_ results_1_strace_

[root@perf results]# cat perf_record_.txt Min: 160756.05Max: 177846.30Avg: 170572.8880 Std Dev %: 3.7418

Example Results Layout

Page 14: 1 ANALYZING LATENCY OF I/O EVENTS - Schedschd.ws/hosted_files/devconfcz2016/3a/Archit Sharma Devconf_16... · 1 ANALYZING LATENCY OF I/O EVENTS ... $ pip install perf-script-postprocessor

14

ADDITIONAL UTILSADDITIONAL UTILSLATENCY_ANALYZERLATENCY_ANALYZER

“ swiss knife for getting started with [native][file I/O] latency analysis [for Qemu-KVM]

- Chewbacca

“ I love this script!- Luke Skywalker

“ pfft..Whatever- Darth Vader

Github: arcolife/latency_analyzer

Page 15: 1 ANALYZING LATENCY OF I/O EVENTS - Schedschd.ws/hosted_files/devconfcz2016/3a/Archit Sharma Devconf_16... · 1 ANALYZING LATENCY OF I/O EVENTS ... $ pip install perf-script-postprocessor

15

WHY ANALYZE L ATENCY ?WHY ANALYZE L ATENCY ?

Code Optimizationeg: OS profiling

Distributed Computinglatency distributions

Cache tuningdistributed cache performance(timed cache access)^N

Web Performancehigh latency may involve:

Load BalancingNetwork LatencyWeb server configuration

Performance Engineering(throughput & latency)

Databasesrecommended I/Oschedulersmemory / caching

VirtualizationBlock and File I/O

NetworkingNetwork I/O

..

Page 16: 1 ANALYZING LATENCY OF I/O EVENTS - Schedschd.ws/hosted_files/devconfcz2016/3a/Archit Sharma Devconf_16... · 1 ANALYZING LATENCY OF I/O EVENTS ... $ pip install perf-script-postprocessor

16

1 how much time spent on each event, WHILEcontrol is in user/kernel space

2 Sorting out anomalies: IOPS throughput differentwith strace, perf record .. At the same time, nrvalues should be long (they're not when usingperf record).

3 .. ?

FOOD FOR THOUGHT?FOOD FOR THOUGHT?

Page 17: 1 ANALYZING LATENCY OF I/O EVENTS - Schedschd.ws/hosted_files/devconfcz2016/3a/Archit Sharma Devconf_16... · 1 ANALYZING LATENCY OF I/O EVENTS ... $ pip install perf-script-postprocessor

17

THANKS!!THANKS!!

Twitter: @

Website: http://work.arcolife.in/

LinkedIn: https://www.linkedin.com/in/arcolife

arcolife