Pro Yearly is on sale from $80 to $50! »

Benchmarking the right way with Perfer

Benchmarking the right way with Perfer

0ea7f61aec8fee539be0cf39b7bab77c?s=128

Benoit Daloze

August 17, 2012
Tweet

Transcript

  1. Benchmarking the right way with Perfer Benoit Daloze August 17,

    2012
  2. Who am I? Benoit Daloze (Github: eregon, Twitter: @eregontp) A

    student at Université Catholique de Louvain (Belgium) A MRI committer, mainly interested in what’s happening for the next version and of course in anything benchmark-related
  3. The talk Benchmarking the right way . . . Factors

    Measuring Analyzing Existing tools . . . with Perfer Measurement Reporting Comparison Visualization Current status Future
  4. Context A complete benchmark tool (and suite) for Ruby implementations

    and libraries Google Summer of Code project http://www.google-melange.com/gsoc/project/ google/gsoc2012/eregon/19001 Perfer: https://github.com/jruby/perfer RubyBench: https://github.com/jruby/rubybench
  5. Motivation Automated benchmarking is not so common in the Ruby

    community Current tools provide the basics, but usually don’t help the user against mistakes and might lead to wrong interpretations Benchmarking such a dynamic language across implementations is hard, it’s worth investigating the problems
  6. Benchmarking the right way . . . Factors Measuring Analyzing

    Existing tools . . . with Perfer
  7. Benchmarking the right way . . . Factors Measuring Analyzing

    Existing tools
  8. Factors Anything which can influence the execution environment: JIT Memory,

    GC Code loaded Context
  9. JIT: JRuby q q q q q q q q

    q q q q q q q q q q qqq q q q q q q q q q q q q q q q qq q q qqqqq q q qqq qqqq q q qqqqq qqqqqqqq qq q q q q q q q q q q q q q q q qq q q q q qqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq 0 5 10 15 20 0.000 0.005 0.010 0.015 JIT with DateTime.strptime(...) for JRuby time (s) time (s) / iteration
  10. JIT: JRuby q q q q q q q q

    q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq qqq q q qq q q q qq q q q qq q q qqqqqqqq q q q q q q q q q q q q q q q q q qq q q q q qq qqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq 0 5 10 15 20 0.000 0.001 0.002 0.003 0.004 0.005 JIT with DateTime.strptime(...) for JRuby time (s) time (s) / iteration
  11. JIT: Rubinius q q qq q qq qqq qqq qq

    qq qqqqq qq qqq qq qqqqq q q qqq qq q qq q qqqqqqqqqq qq q qq q qq q q qqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqq qq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqq 0 5 10 15 20 0.000 0.001 0.002 0.003 0.004 0.005 JIT with DateTime.strptime(...) for Rubinius time (s) time (s) / iteration
  12. JIT: MRI q q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq 0 5 10 15

    20 0.000 0.001 0.002 0.003 0.004 0.005 JIT with DateTime.strptime(...) for MRI time (s) time (s) / iteration
  13. JIT: MRI q q q q q q q q

    q q q q q qqqq q q q q qq q qq qq q qqq q q qqq q qqq q q q q q q qq q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q qq q qqqqq q qqq q q q q q q q q qqq q q q qq q q q q qq q q q q q q q q q q qqqq qq qqqq qq q q q qq q qq qq q q qq q q qq q qq qq q q q q q q qqq q q qqq q q q q q q q qq q qq qq qq q q q q q qqqq q q qq q q q 0 5 10 15 20 0.00060 0.00065 0.00070 0.00075 JIT with DateTime.strptime(...) for MRI time (s) time (s) / iteration
  14. Memory, GC and loading code nothing loaded stdlib one gem

    rails time GC average time nb objects memory used Performance of recursive factorial(5000) 0 2 4 6 8
  15. Context Evaluation context is important Closures are typically a lot

    slower than static methods But they are the most easy way to pass data
  16. Benchmarking the right way . . . Factors Measuring Analyzing

    Existing tools
  17. Time Precision: t=Time.now; Time.now-t ⇒ 1µs to 1 ms Hitimes::Interval.measure

    {} ⇒ 0.2µs to 6µs
  18. Time Precision: t=Time.now; Time.now-t ⇒ 1µs to 1 ms Hitimes::Interval.measure

    {} ⇒ 0.2µs to 6µs Overhead: to yield a block of code: About 100 ns/iteration for a while loop: About 50 ns/iteration when repeating the code 100 times: Less than 1 ns/iteration
  19. Memory Platform specific functions and ps(1). When? between each iteration

    - too much overhead before/after a set of iterations - imprecise thread polling regularly - thread overhead and still imprecise
  20. Memory Platform specific functions and ps(1). When? between each iteration

    - too much overhead before/after a set of iterations - imprecise thread polling regularly - thread overhead and still imprecise Just an approximation, but GC statistics might tell number of objects and internal heap usage. Useful to detect swapping.
  21. Benchmarking the right way . . . Factors Measuring Analyzing

    Existing tools
  22. Statistics Statistics as a way to analyze data automatically To

    take error and incertitude into account To know when the steady state is reached
  23. Distribution and outliers Distribution of 1000 DateTime.strptime measurements Time Density

    1.0 1.2 1.4 1.6 0 10 20 30 40 50
  24. Distribution Distribution of 1000 DateTime.strptime measurements Time Density 1.00 1.02

    1.04 1.06 1.08 1.10 0 10 20 30 40 50
  25. Central tendency: mean, median 1.065 1.070 1.075 1.080 0 50

    100 150 200 250 Mean and Median (10 measurements, strptime) Time Density Median Mean
  26. Measures of dispersion Standard deviation Interquartile range Median absolute deviation

    Maximum absolute deviation
  27. Measures of dispersion q q q q q q q

    q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq qqq q q qq q q q qq q q q qq q q qqqqqqqq q q q q q q q q q q q q q q q q q qq q q q q qq qqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq 0 5 10 15 20 0.000 0.001 0.002 0.003 0.004 0.005 JIT with DateTime.strptime(...) for JRuby time (s) time (s) / iteration
  28. Benchmarking the right way . . . Factors Measuring Analyzing

    Existing tools
  29. The standard library Benchmark.measure { code } # => #<Benchmark

    ::Tms # @real =1.1425 , @utime =0.47 , @stime =0.6 > 10. times { puts Benchmark.measure { code } }
  30. MRI benchmark/driver.rb and the suite Time measured is $ time

    ruby bench.rb Does not work well for JRuby because of VM invocation Not very stable, depends a lot on very few code loaded and initial memory usage Inequality of running times: 0.03 seconds to 2 minutes Makes comparison difficult
  31. Other libraries benchmark-ips from Evan Phoenix viiite Many others for

    specific use-cases or similar to stdlib: rbench: comparisons diffbench: comparisons with git workflow BenchmarkX: graphics rubyperf measurements . . .
  32. Benchmarking the right way . . . . . .

    with Perfer Measurement Reporting Comparison Visualization Current status Future
  33. Perfer vocabulary Benchmark suite Sessions Jobs Two main benchmark types:

    iterative: finds precisely how much time takes some code without side effects input size based: when the input size is the natural parameter (e.g.: Array#sort)
  34. . . . with Perfer Measurement Reporting Comparison Visualization Current

    status Future
  35. Measurement API : iterations require ’tmpdir ’ dir = Dir.tmpdir

    Perfer.session "File.stat" do |s| s.iterate "Simple block" do File.stat(dir) end s.iterate "Block with given argument" do |n| i = 0 while i < n File.stat(dir) i += 1 end end s.iterate "String for eval", "File.stat(dir)", :dir => dir end
  36. Measurement API : input size Perfer.session "Array#sort" do |s| s.metadata

    do description "Sort an Array of random integers" tags Array , :sorting n "size of the Array" start 1024 generator { |n| n * 2 } end s.bench "Array#sort" do |n| ary = Array.new(n) { rand(n) } s.measure do ary.sort end end end
  37. Measurement CLI $ perfer run examples/file_stat.rb Session File.stat with jruby

    1.7.0.preview2 Taking 10 measurements of at least 1.0s File.stat 7.244 µs/i ± 0.181 ( 2.5%) <=> 138037 ips $ perfer help [...] Common options: -t TIME Minimal time to run (greater usually improve accuracy) -m N Numbers of measurements per job -v Verbose
  38. Measurement: metadata recorded metadata: :file: .../ file_stat.rb :session: File.stat :ruby:

    jruby 1.7.0. preview2 (1.9.3 p203) ... :command_line: ! ’/usr/bin/java ... bin/perfer run examples/file_stat.rb’ :run_time: 2012 -08 -14 20:15:30.463000000 +02:00 :minimal_time: 1.0 :measurements: 10 :verbose: false :git_branch: master :git_commit: 3433 deb6727e ... : bench_file_checksum : f04210b5382a ... :job: File.stat :iterations: 143700
  39. Measurement: data recorded data: - :real: 1.1425 :utime: 0.47 :stime:

    0.6 - :real: 1.08617 :utime: 0.48 :stime: 0.59 - :real: 1.1052 :utime: 0.47 :stime: 0.59 ...
  40. . . . with Perfer Measurement Reporting Comparison Visualization Current

    status Future
  41. Reporting $ perfer report examples/file_stat.rb Ran at 2012-08-10 19:03:37 with

    jruby 1.7.0.preview2 File.stat 7.143 µs/i ± 0.164 ( 2.3%) <=> 140000 ips Ran at 2012-08-14 20:15:30 with jruby 1.7.0.preview2 File.stat 7.244 µs/i ± 0.181 ( 2.5%) <=> 138037 ips
  42. . . . with Perfer Measurement Reporting Comparison Visualization Current

    status Future
  43. Comparison q q q q q 2e−04 4e−04 6e−04 8e−04

    Boxplot for comparison Using intervals to estimate imprecision Execution time ratio as an interval as well
  44. . . . with Perfer Measurement Reporting Comparison Visualization Current

    status Future
  45. Visualization CLI-based with: R: heavy dependency, but has all kind

    of statistical charts out of the box ImageMagick: system library needed, but no direct support for charts, and current libraries don’t seem appropriate Web-based with a JS chart library Usually no support fox box plots and visual ways to represent “error”
  46. . . . with Perfer Measurement Reporting Comparison Visualization Current

    status Future
  47. Current status Mainly a work in progress Measuring is implemented

    for iterations, it might need to have some tuning and less hardcoded limits Persistence is there and shouldn’t change much Comparison and graphing yet need to be implemented
  48. . . . with Perfer Measurement Reporting Comparison Visualization Current

    status Future
  49. Big time Use cases: Estimating more precisely the impact of

    new features like refinements, algorithm improvement, etc. Ability to compare performance meaningfully Having some sort of continuous benchmarking, which helps to see the changes over time in different scenarios. For implementations but also libraries.
  50. Questions Any questions?