• flamegraph: graphs to visualize stack traces • message_bus: long polling support for rack apps and group messaging • fast_blank: na?ve rewrite of blank? , free perf bump for rails apps • lru_redux: fastest lru cache implementa?on available for Ruby
Many issues are addressed in 2.1.1 (not yet released as of 19 feb 2014) SEGV in Rails, broken faraday gem, broken excon gem Memory usage is much higher -‐ more info at: h^p://blade.nagaokaut.ac.jp/cgi-‐bin/scat.rb/ruby/ruby-‐ core/59728
• Look at various pages on Discourse both as admin and anon • Run Discourse spec suite • Look at produc?on vs default stacks • Run bench on a stable cpu bare metal (cpufreq-‐u?l sedng to performance)
Sasada (RGenGC) • Granular Global Method Cache invalida?on – By James Golick (ported by Charlie Somerville to 2.1) • Reduced object count on boot – by Aman Gupta • Addi?onal GC tuning ENV vars – Aman and Koichi • Frozen string cache – Charlie and Koichi
Gupta (Ruby-‐core developer) • Used at GitHub in produc?on • Contains fixes for all urgent 2.1.0 issue found to date • Contains performance patch sets, notably vastly improved method cache by funny-‐falcon • 5-‐10% faster across the board for Discourse bench • h^ps://github.com/github/ruby
expansion of heaps, cuts down GC on startup • RUBY_GC_HEAP_FREE_SLOTS=600000 • Ensure enough free heap space for a large amount of reqs (4096 by default) • RUBY_GC_HEAP_GROWTH_FACTOR=1.25 • Grow heaps slower (1.8 by default) • RUBY_GC_HEAP_GROWTH_MAX_SLOTS=300000 • Cap heap growth (not set by default)
contains a GC tracer (print lazy sweep vs minor vs major) • Invoke GC::OOB.run aCer every request (post process) • Works with unicorn, may work with passenger in future • GC is NOT disabled, no need for unicorn killers etc.
RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR • Sedng it to 1.5 will reduce RSS from 248Megs -‐> 197Megs • Will impact perf a bit in some cases, OOBGC will reduce impact • You can disable RGenGC by sedng it to 0.9 • Combined with OOBGC reduces RSS to 176 (130 PSS for 3 workers) • That is only slightly higher than 2.0! (100 PSS)
on GC start / stop • New c level extension points to view internal state • Not compa?ble with Ruby 2.0 • Not (yet) compa?ble with passenger – PR in open discussion
install from source injected using LD_PRELOAD=/usr/ local/lib/libjemalloc.so • 6% RSS reduc?on under 2.1.0 – par?cularly effec?ve when heaps get big • Similar performance to glibc malloc • Keeps RSS low over the long run • Other op?on is tcmalloc (used by GitHub on most setups)
mostly compa?ble with 2.0.0 • 2.1.0 GitHub edi?on avoids nasty segfaults during spec runs • Thanks to Koichi Sasada we no longer need gc malloc voodoo env vars to make specs fast
Remove common objects in snapshot 1 from 2 • Remove missing objects in snapshot 3 from 2 • GC.start may miss some objects • rbtrace could be gathering snapshots during requests
around aCer measured block • Allocated are objects allocated during block of code • High retained = increase memory use, slower major GC • High allocated = slower perf, increased memory use
gems/pg-‐0.15.1/lib/pg/result.rb:10 x 18 -‐ pg gem returns strings for dates, booleans, integers, floats. -‐ Ac?veRecord is stuck conver?ng strings to na?ve types in pure Ruby -‐ Discussing with pg gem owners a fix that converts types in c extension
diagnose • You can take advantage of the new interfaces today • Hold off on Ruby 2.1.0 in produc?on, 2.1.1 will be safe • Don’t apply op?misa?ons blindly. ALWAYS BE MEASURING.