$30 off During Our Annual Pro Sale. View Details »

Why Ruby 2.1 excites me?

Sam
February 20, 2014

Why Ruby 2.1 excites me?

Slides for my talk at rubyconfau 2014

Sam

February 20, 2014
Tweet

More Decks by Sam

Other Decks in Programming

Transcript

  1. Who  am  I? Sam  Saffron   @samsaffron   samsaffron.com  

      Co-­‐founder  Discourse   Previously  at  Stack  Overflow   Premature  op?miser  
  2. About  Discourse •  Next  genera?on  forum  soCware   •  Rails

     4  and  Rails  master   •  Postgres  /  Redis   •  Ember.js   •  Open  Source   •  We  host  on  unicorn  /      nginx  /  haproxy  
  3. My  Gems •  rack-­‐mini-­‐profiler:  visualize  performance  of  your  web  apps

      •  flamegraph:  graphs  to  visualize  stack  traces   •  message_bus:  long  polling  support  for  rack  apps  and  group  messaging   •  fast_blank:  na?ve  rewrite  of  blank?  ,  free  perf  bump  for  rails  apps   •  lru_redux:  fastest  lru  cache  implementa?on  available  for  Ruby  
  4. Agenda •  Is  Ruby  2.1  faster  than  2.0?   • 

    Tuning  Ruby  2.1   •  Tracking  memory  leaks  with  Ruby  2.1   •  Tracking  memory  usage  with  Ruby  2.1   •  Future  work  
  5. Warning Ruby  2.1.0  is  not  ready  for  produc?on.    

    Many  issues  are  addressed  in  2.1.1  (not  yet  released  as  of  19  feb  2014)     SEGV  in  Rails,  broken  faraday  gem,  broken  excon  gem     Memory  usage  is  much  higher     -­‐  more  info  at:  h^p://blade.nagaokaut.ac.jp/cgi-­‐bin/scat.rb/ruby/ruby-­‐ core/59728  
  6. Discourse  Bench •  Simulate  a  1000  topics  and  100  users

      •  Look  at  various  pages  on  Discourse  both  as  admin  and  anon   •  Run  Discourse  spec  suite   •  Look  at  produc?on  vs  default  stacks   •  Run  bench  on  a  stable  cpu  bare  metal  (cpufreq-­‐u?l  sedng  to   performance)  
  7. categories:      50:  89      75:  158  

       90:  160      99:  165   home:      50:  51      75:  54      90:  120      99:  123   topic:      50:  14      75:  15      90:  16      99:  85   ?mings:      load_rails:  3802   ruby-­‐version:  2.0.0-­‐p353   rss_kb:  146328   pss_kb:  142437   architecture:  amd64   opera?ngsystem:  Ubuntu   kernelversion:  3.8.0   memorysize:  23.55  GB   physicalprocessorcount:  1   processor0:  Intel(R)  Core(TM)  i7  CPU                  960    @ virtual:  phys     categories_admin:      50:  109      75:  175      90:  179      99:  207   home_admin:      50:  65      75:  129      90:  134      99:  138   topic_admin:      50:  24      75:  25      90:  27      99:  96   Sample   Results  
  8. Ruby  2.1.0  vs  Ruby  2.0.0-­‐p353 0   50   100

      150   200   250   300   350   400   450   RSS  (kb)   Heap  Used  (kb)   Post  GC  RVALUEs  (*1000)   Memory   2.0.0-­‐p353   2.1.0  
  9. Page  load  Ime 0   20   40   60

      80   100   120   140   160   180   200   Home   Home  (admin)   Categories   Categories  (admin)   2.0  median   2.1  median   2.0  90th  percen?le   2.1  90th  percen?le  
  10. Summary  (out-­‐of-­‐the-­‐box  setup) •  2.1.0  consumes  much  more  memory  (75%

     RSS  increase)     •  2.1.0  is  faster  for  median  (1-­‐2%  faster)   •  2.1.0  is  significantly  faster  for  90th  percen?le  (in  some  cases  60%+)  
  11. Key  Performance  improvements •  Restricted  Genera?onal  GC  –  by  Koichi

     Sasada  (RGenGC)   •  Granular  Global  Method  Cache  invalida?on  –  By  James  Golick  (ported   by  Charlie  Somerville  to  2.1)   •  Reduced  object  count  on  boot  –  by  Aman  Gupta   •  Addi?onal  GC  tuning  ENV  vars  –  Aman  and  Koichi   •  Frozen  string  cache  –  Charlie  and  Koichi  
  12. ProducIon  opImisaIons •  Custom  Ruby  builds   •  GC  tuning

      •  Unicorn  +  OOBGC   •  LD_PRELOAD=jemalloc  
  13. GitHub  ruby  2.1.0  build •  Patch  set  managed  by  Aman

     Gupta  (Ruby-­‐core  developer)   •  Used  at  GitHub  in  produc?on   •  Contains  fixes  for  all  urgent  2.1.0  issue  found  to  date   •  Contains  performance  patch  sets,  notably  vastly  improved  method   cache  by  funny-­‐falcon   •  5-­‐10%  faster  across  the  board  for  Discourse  bench   •  h^ps://github.com/github/ruby  
  14. GitHub  2.1  GC  tuning •  RUBY_GC_HEAP_INIT_SLOTS=600000   •  Avoids  ini?al

     expansion  of  heaps,  cuts  down  GC  on  startup   •  RUBY_GC_HEAP_FREE_SLOTS=600000   •  Ensure  enough  free  heap  space  for  a  large  amount  of  reqs  (4096  by  default)   •  RUBY_GC_HEAP_GROWTH_FACTOR=1.25   •  Grow  heaps  slower  (1.8  by  default)   •  RUBY_GC_HEAP_GROWTH_MAX_SLOTS=300000     •  Cap  heap  growth  (not  set  by  default)  
  15. GC  tuning  effect  on  bench •  Performance  decrease  L  (~5%)

      •  Heap  size  much  reduced  J   •  RSS  decrease:  266MB  -­‐>  241MB     •  S?ll  faster  than  Ruby  2.1.0  stock  
  16. Out-­‐of-­‐band  GC •  Use  gctools  gem   •  gctools  also

     contains  a  GC  tracer  (print  lazy  sweep  vs  minor  vs  major)   •  Invoke  GC::OOB.run  aCer  every  request  (post  process)   •  Works  with  unicorn,  may  work  with  passenger  in  future   •  GC  is  NOT  disabled,  no  need  for  unicorn  killers  etc.  
  17. OOBGC  impact 0   10   20   30  

    40   50   60   70   80   90   100   Home  Page  (median)   Home  Page  (99th  percen?le)   Topic  Page  (median)   Topic  Page  (99th  percen?le)   2.1.0   2.1.0  +  OOBGC  
  18. OOBGC  impact •  Similar  perf  for  median  requests   • 

    Significantly  reduced  PSS  –  child  memory  impact  (20-­‐30%  improved   for  3  children)   •  Significantly  be^er  perf  for  99%  percen?le.  Can  be  50%  faster.    
  19. Reducing  RSS  further •  Ruby  2.1.1  (and  GitHub  branch)  introduce

      RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR   •  Sedng  it  to  1.5  will  reduce  RSS  from  248Megs  -­‐>  197Megs     •  Will  impact  perf  a  bit  in  some  cases,  OOBGC  will  reduce  impact   •  You  can  disable  RGenGC  by  sedng  it  to  0.9   •  Combined  with  OOBGC  reduces  RSS  to  176  (130  PSS  for  3  workers)   •  That  is  only  slightly  higher  than  2.0!  (100  PSS)    
  20. Ruby  2.1  OOBGC •  New  c  level  hooks  to  inform

     on  GC  start  /  stop   •  New  c  level  extension  points  to  view  internal  state   •  Not  compa?ble  with  Ruby  2.0   •  Not  (yet)  compa?ble  with  passenger  –  PR  in  open  discussion  
  21. jemalloc •  Replacement  malloc  library   •  If  you  make

     install  from  source  injected  using  LD_PRELOAD=/usr/ local/lib/libjemalloc.so   •  6%  RSS  reduc?on  under  2.1.0  –  par?cularly  effec?ve  when  heaps  get   big   •  Similar  performance  to  glibc  malloc   •  Keeps  RSS  low  over  the  long  run   •  Other  op?on  is  tcmalloc  (used  by  GitHub  on  most  setups)  
  22. Run  your  tests  faster  with  Ruby  2.1.0 •  2.1.0  is

     mostly  compa?ble  with  2.0.0   •  2.1.0  GitHub  edi?on  avoids  nasty  segfaults  during  spec  runs   •  Thanks  to  Koichi  Sasada  we  no  longer  need  gc  malloc  voodoo  env  vars   to  make  specs  fast  
  23. Run  your  tests  faster  with  Ruby  2.1.0 0   50

      100   150   200   250   2.0.0-­‐p353   2.0.0-­‐p353   RUBY_GC_MALLOC_LIMIT=40000000   2.1.0   2.1.0  GitHub   Dura?on  seconds  (lower  be^er)   42%  Faster   42%  Faster   Run  ?me  in  seconds  for  Discourse  bench  
  24. Tracking  a  memory  leak  with  Ruby  2.1 •  Ini?al  detec?on,

     ps  reports  rss  increasing  over  ?me   •  Use  rbtrace  gem  (require  in  process  analysed,  low  impact  safe  in  prd)   •  Collect  heap  dump   •  Cause  leak   •  Collect  heap  dump   •  Compare  
  25. CollecIng  a  heap  dump  with  rbtrace rbtrace  -­‐p  15193  -­‐e

     'Thread.new{require   "objspace";   ObjectSpace.trace_object_allocations_star t;  GC.start();   ObjectSpace.dump_all(output:   File.open("heap.json","w"))}.join'  
  26. Results sam@ubuntu  discourse  %  ruby  script/diff_heaps.rb   heap.json  heap2.json  

    Leaked  10172  STRING  objects  at:  /home/sam/Source/ discourse/config/environments/development.rb:59    
  27. Heap  dump  is  a  goldmine •  Contains  all  cross  object

     references     •  Contains  all  roots   •  Encoding  for  strings   •  Bytesize   •  GC  informa?on  (oldgen  vs  new  objects)   •  Actual  string  values  
  28. Improving  leak  detecIon •  Gather  3  snapshots     • 

    Remove  common  objects  in  snapshot  1  from  2   •  Remove  missing  objects  in  snapshot  3  from  2   •  GC.start  may  miss  some  objects   •  rbtrace  could  be  gathering  snapshots  during  requests  
  29. Memory  profiling •  memory_profiler  gem   •  Uses  new  alloca?on

     tracing  API   require  ‘objspace’   ObjectSpace.trace_object_alloca?ons  {}   ObjectSpace.alloca?on_sourceline   ObjectSpace.alloca?on_class_path   ObjectSpace.alloca?on_method_id  
  30. Retained  vs  Allocated •  Retained  are  objects  that  are  s?ll

     around  aCer  measured  block   •  Allocated  are  objects  allocated  during  block  of  code   •  High  retained  =  increase  memory  use,  slower  major  GC   •  High  allocated  =  slower  perf,  increased  memory  use  
  31. Report  Card  mime-­‐types-­‐1.25.1 •  Allocated  Memory:  3.4MB   •  Allocated

     Objects:  68K   •  Retained  Memory:  ~0.85MB   •  Retained  Objects:  18474   •  “applica?on”  retained  2379  ?mes  types.rb  (line  421  /  426)  
  32. pg  gem  performs  no  casIng "f"  x  18    

         gems/pg-­‐0.15.1/lib/pg/result.rb:10  x  18     -­‐  pg  gem  returns  strings  for  dates,  booleans,  integers,  floats.     -­‐  Ac?veRecord  is  stuck  conver?ng  strings  to  na?ve  types  in  pure  Ruby   -­‐  Discussing  with  pg  gem  owners  a  fix  that  converts  types  in  c   extension  
  33. Resolving  allocaIons •  Op?on  1:  Rewrite  code  to  avoid  aCer_ini?alize

     (preferred  solu?on)   •  Op?on  2:  Use  Ruby  2.1  fstring:  “string”.freeze   Ruby  2.1:  same  object  !!!   Ruby  2.0  
  34. Awesome  performance  hooks •  Memory  profiling   •  Delayed  job

     API   •  rb_postponed_job_register_one   •  Efficient  stack  traces   •  rb_profile_frames   •  C  level  trace  points   •  rb_tracepoint_new   •  Much  be^er  transparency  with  stats   •  GC::INTERNAL_CONSTANTS   •  GC.stat   •  GC.latest_gc_info  
  35. Other  changes  in  2.1 •  Refinements   •  Decimal  Literals?

     0.1r   •  Op?onal  keyword  arguments   •  Method  defini?on  returns  method  name   •  String#scrub   •  Excep?on#cause  
  36. Summary •  Ruby  MRI  is  gedng  faster  and  easier  to

     diagnose   •  You  can  take  advantage  of  the  new  interfaces  today   •  Hold  off  on  Ruby  2.1.0  in  produc?on,  2.1.1  will  be  safe   •  Don’t  apply  op?misa?ons  blindly.  ALWAYS  BE  MEASURING.  
  37. QuesIons •  Contact  me  via  twi^er:  @samsaffron   •  Join

     Ruby  rogues  parley  an  men?on  me   •  Ask  on  Stack  Overflow  and  send  me  a  link  
  38. Resources •  Demys?fying  The  Ruby  GC:  h^p://samsaffron.com/archive/2013/11/22/demys?fying-­‐the-­‐ruby-­‐gc   •  Jemalloc

     in  Ruby  core:  h^ps://bugs.ruby-­‐lang.org/issues/9113   •  Aman’s  blog  for  Ruby  2.1  news:  h^p://tmm1.net/   •  memory_profiler:  h^ps://github.com/SamSaffron/memory_profiler   •  rack_mini_profiler:  h^ps://github.com/miniprofiler/rack-­‐mini-­‐profiler   •  stackprof:  h^ps://github.com/tmm1/stackprof   •  flamegraph:  h^ps://github.com/samsaffron/flamegraph   •  Global  class  cache  change  to  Ruby  core:  h^ps://bugs.ruby-­‐lang.org/issues/9262     •  Changes  in  Ruby  2.1:  h^p://rkh.im/ruby-­‐2.1   •  Object  management  on  Ruby  2.1:  h^p://www.confreaks.com/videos/2866-­‐rubyconf2013-­‐object-­‐management-­‐on-­‐ruby-­‐2-­‐1   •  Raw  bench  stats:  h^ps://gist.github.com/SamSaffron/9029928   •  Call  to  ac?on  long  running  benchmark:  h^p://samsaffron.com/archive/2013/12/11/call-­‐to-­‐ac?on-­‐long-­‐running-­‐ruby-­‐benchmark   •  Discourse  bench  and  other  scripts:  h^ps://github.com/discourse/discourse/tree/master/script