Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Benchmarking the right way with Perfer

Benchmarking the right way with Perfer

Benoit Daloze

August 17, 2012
Tweet

More Decks by Benoit Daloze

Other Decks in Programming

Transcript

  1. Who am I? Benoit Daloze (Github: eregon, Twitter: @eregontp) A

    student at Université Catholique de Louvain (Belgium) A MRI committer, mainly interested in what’s happening for the next version and of course in anything benchmark-related
  2. The talk Benchmarking the right way . . . Factors

    Measuring Analyzing Existing tools . . . with Perfer Measurement Reporting Comparison Visualization Current status Future
  3. Context A complete benchmark tool (and suite) for Ruby implementations

    and libraries Google Summer of Code project http://www.google-melange.com/gsoc/project/ google/gsoc2012/eregon/19001 Perfer: https://github.com/jruby/perfer RubyBench: https://github.com/jruby/rubybench
  4. Motivation Automated benchmarking is not so common in the Ruby

    community Current tools provide the basics, but usually don’t help the user against mistakes and might lead to wrong interpretations Benchmarking such a dynamic language across implementations is hard, it’s worth investigating the problems
  5. JIT: JRuby q q q q q q q q

    q q q q q q q q q q qqq q q q q q q q q q q q q q q q qq q q qqqqq q q qqq qqqq q q qqqqq qqqqqqqq qq q q q q q q q q q q q q q q q qq q q q q qqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq 0 5 10 15 20 0.000 0.005 0.010 0.015 JIT with DateTime.strptime(...) for JRuby time (s) time (s) / iteration
  6. JIT: JRuby q q q q q q q q

    q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq qqq q q qq q q q qq q q q qq q q qqqqqqqq q q q q q q q q q q q q q q q q q qq q q q q qq qqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq 0 5 10 15 20 0.000 0.001 0.002 0.003 0.004 0.005 JIT with DateTime.strptime(...) for JRuby time (s) time (s) / iteration
  7. JIT: Rubinius q q qq q qq qqq qqq qq

    qq qqqqq qq qqq qq qqqqq q q qqq qq q qq q qqqqqqqqqq qq q qq q qq q q qqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqq qq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqq 0 5 10 15 20 0.000 0.001 0.002 0.003 0.004 0.005 JIT with DateTime.strptime(...) for Rubinius time (s) time (s) / iteration
  8. JIT: MRI q q q q q q q q

    q q q q q qqqq q q q q qq q qq qq q qqq q q qqq q qqq q q q q q q qq q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q qq q qqqqq q qqq q q q q q q q q qqq q q q qq q q q q qq q q q q q q q q q q qqqq qq qqqq qq q q q qq q qq qq q q qq q q qq q qq qq q q q q q q qqq q q qqq q q q q q q q qq q qq qq qq q q q q q qqqq q q qq q q q 0 5 10 15 20 0.00060 0.00065 0.00070 0.00075 JIT with DateTime.strptime(...) for MRI time (s) time (s) / iteration
  9. Memory, GC and loading code nothing loaded stdlib one gem

    rails time GC average time nb objects memory used Performance of recursive factorial(5000) 0 2 4 6 8
  10. Context Evaluation context is important Closures are typically a lot

    slower than static methods But they are the most easy way to pass data
  11. Time Precision: t=Time.now; Time.now-t ⇒ 1µs to 1 ms Hitimes::Interval.measure

    {} ⇒ 0.2µs to 6µs Overhead: to yield a block of code: About 100 ns/iteration for a while loop: About 50 ns/iteration when repeating the code 100 times: Less than 1 ns/iteration
  12. Memory Platform specific functions and ps(1). When? between each iteration

    - too much overhead before/after a set of iterations - imprecise thread polling regularly - thread overhead and still imprecise
  13. Memory Platform specific functions and ps(1). When? between each iteration

    - too much overhead before/after a set of iterations - imprecise thread polling regularly - thread overhead and still imprecise Just an approximation, but GC statistics might tell number of objects and internal heap usage. Useful to detect swapping.
  14. Statistics Statistics as a way to analyze data automatically To

    take error and incertitude into account To know when the steady state is reached
  15. Central tendency: mean, median 1.065 1.070 1.075 1.080 0 50

    100 150 200 250 Mean and Median (10 measurements, strptime) Time Density Median Mean
  16. Measures of dispersion q q q q q q q

    q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq qqq q q qq q q q qq q q q qq q q qqqqqqqq q q q q q q q q q q q q q q q q q qq q q q q qq qqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq 0 5 10 15 20 0.000 0.001 0.002 0.003 0.004 0.005 JIT with DateTime.strptime(...) for JRuby time (s) time (s) / iteration
  17. The standard library Benchmark.measure { code } # => #<Benchmark

    ::Tms # @real =1.1425 , @utime =0.47 , @stime =0.6 > 10. times { puts Benchmark.measure { code } }
  18. MRI benchmark/driver.rb and the suite Time measured is $ time

    ruby bench.rb Does not work well for JRuby because of VM invocation Not very stable, depends a lot on very few code loaded and initial memory usage Inequality of running times: 0.03 seconds to 2 minutes Makes comparison difficult
  19. Other libraries benchmark-ips from Evan Phoenix viiite Many others for

    specific use-cases or similar to stdlib: rbench: comparisons diffbench: comparisons with git workflow BenchmarkX: graphics rubyperf measurements . . .
  20. Benchmarking the right way . . . . . .

    with Perfer Measurement Reporting Comparison Visualization Current status Future
  21. Perfer vocabulary Benchmark suite Sessions Jobs Two main benchmark types:

    iterative: finds precisely how much time takes some code without side effects input size based: when the input size is the natural parameter (e.g.: Array#sort)
  22. Measurement API : iterations require ’tmpdir ’ dir = Dir.tmpdir

    Perfer.session "File.stat" do |s| s.iterate "Simple block" do File.stat(dir) end s.iterate "Block with given argument" do |n| i = 0 while i < n File.stat(dir) i += 1 end end s.iterate "String for eval", "File.stat(dir)", :dir => dir end
  23. Measurement API : input size Perfer.session "Array#sort" do |s| s.metadata

    do description "Sort an Array of random integers" tags Array , :sorting n "size of the Array" start 1024 generator { |n| n * 2 } end s.bench "Array#sort" do |n| ary = Array.new(n) { rand(n) } s.measure do ary.sort end end end
  24. Measurement CLI $ perfer run examples/file_stat.rb Session File.stat with jruby

    1.7.0.preview2 Taking 10 measurements of at least 1.0s File.stat 7.244 µs/i ± 0.181 ( 2.5%) <=> 138037 ips $ perfer help [...] Common options: -t TIME Minimal time to run (greater usually improve accuracy) -m N Numbers of measurements per job -v Verbose
  25. Measurement: metadata recorded metadata: :file: .../ file_stat.rb :session: File.stat :ruby:

    jruby 1.7.0. preview2 (1.9.3 p203) ... :command_line: ! ’/usr/bin/java ... bin/perfer run examples/file_stat.rb’ :run_time: 2012 -08 -14 20:15:30.463000000 +02:00 :minimal_time: 1.0 :measurements: 10 :verbose: false :git_branch: master :git_commit: 3433 deb6727e ... : bench_file_checksum : f04210b5382a ... :job: File.stat :iterations: 143700
  26. Measurement: data recorded data: - :real: 1.1425 :utime: 0.47 :stime:

    0.6 - :real: 1.08617 :utime: 0.48 :stime: 0.59 - :real: 1.1052 :utime: 0.47 :stime: 0.59 ...
  27. Reporting $ perfer report examples/file_stat.rb Ran at 2012-08-10 19:03:37 with

    jruby 1.7.0.preview2 File.stat 7.143 µs/i ± 0.164 ( 2.3%) <=> 140000 ips Ran at 2012-08-14 20:15:30 with jruby 1.7.0.preview2 File.stat 7.244 µs/i ± 0.181 ( 2.5%) <=> 138037 ips
  28. Comparison q q q q q 2e−04 4e−04 6e−04 8e−04

    Boxplot for comparison Using intervals to estimate imprecision Execution time ratio as an interval as well
  29. Visualization CLI-based with: R: heavy dependency, but has all kind

    of statistical charts out of the box ImageMagick: system library needed, but no direct support for charts, and current libraries don’t seem appropriate Web-based with a JS chart library Usually no support fox box plots and visual ways to represent “error”
  30. Current status Mainly a work in progress Measuring is implemented

    for iterations, it might need to have some tuning and less hardcoded limits Persistence is there and shouldn’t change much Comparison and graphing yet need to be implemented
  31. Big time Use cases: Estimating more precisely the impact of

    new features like refinements, algorithm improvement, etc. Ability to compare performance meaningfully Having some sort of continuous benchmarking, which helps to see the changes over time in different scenarios. For implementations but also libraries.