Benchmarking the right way with Perfer

Benchmarking the right way with Perfer Benoit Daloze August 17,
2012

Who am I? Benoit Daloze (Github: eregon, Twitter: @eregontp) A
student at Université Catholique de Louvain (Belgium) A MRI committer, mainly interested in what’s happening for the next version and of course in anything benchmark-related

The talk Benchmarking the right way . . . Factors
Measuring Analyzing Existing tools . . . with Perfer Measurement Reporting Comparison Visualization Current status Future

Context A complete benchmark tool (and suite) for Ruby implementations
and libraries Google Summer of Code project http://www.google-melange.com/gsoc/project/ google/gsoc2012/eregon/19001 Perfer: https://github.com/jruby/perfer RubyBench: https://github.com/jruby/rubybench

Motivation Automated benchmarking is not so common in the Ruby
community Current tools provide the basics, but usually don’t help the user against mistakes and might lead to wrong interpretations Benchmarking such a dynamic language across implementations is hard, it’s worth investigating the problems

Benchmarking the right way . . . Factors Measuring Analyzing
Existing tools . . . with Perfer

Existing tools

Factors Anything which can inﬂuence the execution environment: JIT Memory,
GC Code loaded Context

JIT: JRuby q q q q q q q q
q q q q q q q q q q qqq q q q q q q q q q q q q q q q qq q q qqqqq q q qqq qqqq q q qqqqq qqqqqqqq qq q q q q q q q q q q q q q q q qq q q q q qqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq 0 5 10 15 20 0.000 0.005 0.010 0.015 JIT with DateTime.strptime(...) for JRuby time (s) time (s) / iteration

JIT: JRuby q q q q q q q q
q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq qqq q q qq q q q qq q q q qq q q qqqqqqqq q q q q q q q q q q q q q q q q q qq q q q q qq qqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq 0 5 10 15 20 0.000 0.001 0.002 0.003 0.004 0.005 JIT with DateTime.strptime(...) for JRuby time (s) time (s) / iteration

JIT: Rubinius q q qq q qq qqq qqq qq
qq qqqqq qq qqq qq qqqqq q q qqq qq q qq q qqqqqqqqqq qq q qq q qq q q qqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqq qq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqq 0 5 10 15 20 0.000 0.001 0.002 0.003 0.004 0.005 JIT with DateTime.strptime(...) for Rubinius time (s) time (s) / iteration

JIT: MRI q q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq 0 5 10 15
20 0.000 0.001 0.002 0.003 0.004 0.005 JIT with DateTime.strptime(...) for MRI time (s) time (s) / iteration

JIT: MRI q q q q q q q q
q q q q q qqqq q q q q qq q qq qq q qqq q q qqq q qqq q q q q q q qq q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q qq q qqqqq q qqq q q q q q q q q qqq q q q qq q q q q qq q q q q q q q q q q qqqq qq qqqq qq q q q qq q qq qq q q qq q q qq q qq qq q q q q q q qqq q q qqq q q q q q q q qq q qq qq qq q q q q q qqqq q q qq q q q 0 5 10 15 20 0.00060 0.00065 0.00070 0.00075 JIT with DateTime.strptime(...) for MRI time (s) time (s) / iteration

Memory, GC and loading code nothing loaded stdlib one gem
rails time GC average time nb objects memory used Performance of recursive factorial(5000) 0 2 4 6 8

Context Evaluation context is important Closures are typically a lot
slower than static methods But they are the most easy way to pass data

Existing tools

Time Precision: t=Time.now; Time.now-t ⇒ 1µs to 1 ms Hitimes::Interval.measure
{} ⇒ 0.2µs to 6µs

Time Precision: t=Time.now; Time.now-t ⇒ 1µs to 1 ms Hitimes::Interval.measure
{} ⇒ 0.2µs to 6µs Overhead: to yield a block of code: About 100 ns/iteration for a while loop: About 50 ns/iteration when repeating the code 100 times: Less than 1 ns/iteration

Memory Platform speciﬁc functions and ps(1). When? between each iteration
- too much overhead before/after a set of iterations - imprecise thread polling regularly - thread overhead and still imprecise

Memory Platform speciﬁc functions and ps(1). When? between each iteration
- too much overhead before/after a set of iterations - imprecise thread polling regularly - thread overhead and still imprecise Just an approximation, but GC statistics might tell number of objects and internal heap usage. Useful to detect swapping.

Existing tools

Statistics Statistics as a way to analyze data automatically To
take error and incertitude into account To know when the steady state is reached

Distribution and outliers Distribution of 1000 DateTime.strptime measurements Time Density
1.0 1.2 1.4 1.6 0 10 20 30 40 50

Distribution Distribution of 1000 DateTime.strptime measurements Time Density 1.00 1.02
1.04 1.06 1.08 1.10 0 10 20 30 40 50

Central tendency: mean, median 1.065 1.070 1.075 1.080 0 50
100 150 200 250 Mean and Median (10 measurements, strptime) Time Density Median Mean

Measures of dispersion Standard deviation Interquartile range Median absolute deviation
Maximum absolute deviation

Measures of dispersion q q q q q q q
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq qqq q q qq q q q qq q q q qq q q qqqqqqqq q q q q q q q q q q q q q q q q q qq q q q q qq qqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq 0 5 10 15 20 0.000 0.001 0.002 0.003 0.004 0.005 JIT with DateTime.strptime(...) for JRuby time (s) time (s) / iteration

Existing tools

The standard library Benchmark.measure { code } # => #<Benchmark
::Tms # @real =1.1425 , @utime =0.47 , @stime =0.6 > 10. times { puts Benchmark.measure { code } }

MRI benchmark/driver.rb and the suite Time measured is $ time
ruby bench.rb Does not work well for JRuby because of VM invocation Not very stable, depends a lot on very few code loaded and initial memory usage Inequality of running times: 0.03 seconds to 2 minutes Makes comparison diﬃcult

Other libraries benchmark-ips from Evan Phoenix viiite Many others for
specific use-cases or similar to stdlib: rbench: comparisons diffbench: comparisons with git workflow BenchmarkX: graphics rubyperf measurements . . .

Benchmarking the right way . . . . . .
with Perfer Measurement Reporting Comparison Visualization Current status Future

Perfer vocabulary Benchmark suite Sessions Jobs Two main benchmark types:
iterative: ﬁnds precisely how much time takes some code without side eﬀects input size based: when the input size is the natural parameter (e.g.: Array#sort)

. . . with Perfer Measurement Reporting Comparison Visualization Current
status Future

Measurement API : iterations require ’tmpdir ’ dir = Dir.tmpdir
Perfer.session "File.stat" do |s| s.iterate "Simple block" do File.stat(dir) end s.iterate "Block with given argument" do |n| i = 0 while i < n File.stat(dir) i += 1 end end s.iterate "String for eval", "File.stat(dir)", :dir => dir end

Measurement API : input size Perfer.session "Array#sort" do |s| s.metadata
do description "Sort an Array of random integers" tags Array , :sorting n "size of the Array" start 1024 generator { |n| n * 2 } end s.bench "Array#sort" do |n| ary = Array.new(n) { rand(n) } s.measure do ary.sort end end end

Measurement CLI $ perfer run examples/file_stat.rb Session File.stat with jruby
1.7.0.preview2 Taking 10 measurements of at least 1.0s File.stat 7.244 µs/i ± 0.181 ( 2.5%) <=> 138037 ips $ perfer help [...] Common options: -t TIME Minimal time to run (greater usually improve accuracy) -m N Numbers of measurements per job -v Verbose

Measurement: metadata recorded metadata: :file: .../ file_stat.rb :session: File.stat :ruby:
jruby 1.7.0. preview2 (1.9.3 p203) ... :command_line: ! ’/usr/bin/java ... bin/perfer run examples/file_stat.rb’ :run_time: 2012 -08 -14 20:15:30.463000000 +02:00 :minimal_time: 1.0 :measurements: 10 :verbose: false :git_branch: master :git_commit: 3433 deb6727e ... : bench_file_checksum : f04210b5382a ... :job: File.stat :iterations: 143700

Measurement: data recorded data: - :real: 1.1425 :utime: 0.47 :stime:
0.6 - :real: 1.08617 :utime: 0.48 :stime: 0.59 - :real: 1.1052 :utime: 0.47 :stime: 0.59 ...

status Future

Reporting $ perfer report examples/file_stat.rb Ran at 2012-08-10 19:03:37 with
jruby 1.7.0.preview2 File.stat 7.143 µs/i ± 0.164 ( 2.3%) <=> 140000 ips Ran at 2012-08-14 20:15:30 with jruby 1.7.0.preview2 File.stat 7.244 µs/i ± 0.181 ( 2.5%) <=> 138037 ips

status Future

Comparison q q q q q 2e−04 4e−04 6e−04 8e−04
Boxplot for comparison Using intervals to estimate imprecision Execution time ratio as an interval as well

status Future

Visualization CLI-based with: R: heavy dependency, but has all kind
of statistical charts out of the box ImageMagick: system library needed, but no direct support for charts, and current libraries don’t seem appropriate Web-based with a JS chart library Usually no support fox box plots and visual ways to represent “error”

status Future

Current status Mainly a work in progress Measuring is implemented
for iterations, it might need to have some tuning and less hardcoded limits Persistence is there and shouldn’t change much Comparison and graphing yet need to be implemented

status Future

Big time Use cases: Estimating more precisely the impact of
new features like reﬁnements, algorithm improvement, etc. Ability to compare performance meaningfully Having some sort of continuous benchmarking, which helps to see the changes over time in diﬀerent scenarios. For implementations but also libraries.

Questions Any questions?

Benchmarking the right way with Perfer

Benchmarking the right way with Perfer

More Decks by Benoit Daloze

Other Decks in Programming

Featured

Transcript