Slide 1

Slide 1 text

1 ANALYZING LATENCY ANALYZING LATENCY OF OF I/O EVENTS I/O EVENTS ARCHIT SHARMA ARCHIT SHARMA ASSOCIATE PERFORMANCE ENGINEER ASSOCIATE PERFORMANCE ENGINEER BLR | Red Hat India Pvt. Ltd.

Slide 2

Slide 2 text

2 An I/O use case The investigation: Block I/O events native vs. threads in Qemu-KVM IOPS performance benchmarking/debugging General approaches Tools/utilities we've rolled out: includes benchmarking IOPS postprocessing that data Applicability of Latency analysis THINGS WE'RE GONNA THINGS WE'RE GONNA TALK ABOUT TALK ABOUT

Slide 3

Slide 3 text

3 Whether the delay is being produced by filesystem / kvm layer? IO engines: How does async compare to sync ? How does a setup with target:threads compare to one with target:native for a kernel version? Would I achieve better results if I changed iodepth? Block I/O and File I/O USE CASE USE CASE I/O EVENTS IN QEMU-KVM I/O EVENTS IN QEMU-KVM

Slide 4

Slide 4 text

4 [Native] kvm_exit -> sys_exit_ppoll -> sys_enter_io_submit -> sys_exit_io_submit .. .. -> sys_enter_io_getevents -> sys_exit_io_getevents BLOCK I/O EVENTS IN QEMU- BLOCK I/O EVENTS IN QEMU- KVM KVM An investigation of blockIO events: tracing and analyzing them Came up with a couple of utilities to help analyze I/O latency..

Slide 5

Slide 5 text

5 GENERAL APPROACHES GENERAL APPROACHES IOPS Benchmarking - Our addon: Debugging: Widely used Our addon: I/O Event FIO pbench_fio perf-tools loop latency processor IOPS PERFORMANCE BENCHMARKING/DEBUGGING IOPS PERFORMANCE BENCHMARKING/DEBUGGING

Slide 6

Slide 6 text

6 2003 2003 2001 2001 2009 2009 2016 2016 2015 2015 LINUX PERF ANALYSIS TOOLS TIMELINE LINUX PERF ANALYSIS TOOLS TIMELINE pbench --------------- perf-script postprocessor

Slide 7

Slide 7 text

7 PBENCH PBENCH http://distributed-system-analysis.github.io/pbench/ A Benchmarking and Performance Analysis Framework Allows commonly used / even custom benchmarking scripts! Dynamic visualizations enabling hands-on exploration and deeper insights into potential bottleneck regions Easy to use and setup Exciting upcoming features.. Open for contributions!

Slide 8

Slide 8 text

8 PBENCH PBENCH http://distributed-system-analysis.github.io/pbench/ A Benchmarking and Performance Analysis Framework 1 A collection agent (pbench-agent) -> Handles TLC - Telemetry, Logs and Configurations 2 Background tasks (bgtasks) -> Archives result tar balls, indexes them, and unpacks them for display. 3 Web server -> display various graphs and results

Slide 9

Slide 9 text

9 PBENCH PBENCH http://distributed-system-analysis.github.io/pbench/ A Benchmarking and Performance Analysis Framework

Slide 10

Slide 10 text

10 Hands-on tracing with flexible approach specify your own event loops! Lots of use cases - disk I/O, network I/O, .. A statistical, descriptive and visual approach to latency analysis Available on pypi! $ pip install perf-script-postprocessor PERF SCRIPT POSTPROCESSOR PERF SCRIPT POSTPROCESSOR A DEBUGGING TOOL A DEBUGGING TOOL Github: arcolife/perf-script-postprocessor

Slide 11

Slide 11 text

11 PERF SCRIPT POSTPROCESSOR PERF SCRIPT POSTPROCESSOR A DEBUGGING TOOL A DEBUGGING TOOL (PERF TOOLS) - $ PERF KVM RECORD (PERF TOOLS) - $ PERF KVM RECORD GENERATES BINARY DATA FILE GENERATES BINARY DATA FILE PERF.DATA PERF.DATA $ PERF_SCRIPT_PROCESSOR $ PERF_SCRIPT_PROCESSOR {MEAN, MEDIAN, STD_DEVIATION} {MEAN, MEDIAN, STD_DEVIATION} EVENT LOOP LATENCIES EVENT LOOP LATENCIES

Slide 12

Slide 12 text

12 PERF SCRIPT POSTPROCESSOR PERF SCRIPT POSTPROCESSOR A DEBUGGING TOOL A DEBUGGING TOOL

Slide 13

Slide 13 text

13 ADDITIONAL UTILS ADDITIONAL UTILS KVM_IO - BENCH_ITER.SH KVM_IO - BENCH_ITER.SH [root@perf results]# ls 1/ 2/ 3/ 4/ 5/ perf_record_.txt perf_kvm_record_.txt perf_trace_.txt strace_.txt [root@perf results]# ls 1/ output_perf_trace output_strace perf_record.data perf_kvm_record.data results_1_perf_record_ results_1_perf_trace_ results_1_perf_trace_record_ results_1_strace_ [root@perf results]# cat perf_record_.txt Min: 160756.05 Max: 177846.30 Avg: 170572.8880 Std Dev %: 3.7418 Example Results Layout

Slide 14

Slide 14 text

14 ADDITIONAL UTILS ADDITIONAL UTILS LATENCY_ANALYZER LATENCY_ANALYZER “ swiss knife for getting started with [native] [file I/O] latency analysis [for Qemu-KVM] - Chewbacca “ I love this script! - Luke Skywalker “ pfft..Whatever - Darth Vader Github: arcolife/latency_analyzer

Slide 15

Slide 15 text

15 WHY ANALYZE LATENCY ? WHY ANALYZE LATENCY ? Code Optimization eg: OS profiling Distributed Computing latency distributions Cache tuning distributed cache performance (timed cache access)^N Web Performance high latency may involve: Load Balancing Network Latency Web server configuration Performance Engineering (throughput & latency) Databases recommended I/O schedulers memory / caching Virtualization Block and File I/O Networking Network I/O ..

Slide 16

Slide 16 text

16 1 how much time spent on each event, WHILE control is in user/kernel space 2 Sorting out anomalies: IOPS throughput different with strace, perf record .. At the same time, nr values should be long (they're not when using perf record). 3 .. ? FOOD FOR THOUGHT? FOOD FOR THOUGHT?

Slide 17

Slide 17 text

17 THANKS!! THANKS!! Twitter: @ Website: http://work.arcolife.in/ LinkedIn: https://www.linkedin.com/in/arcolife arcolife