Slide 1

Slide 1 text

Why  Ruby  2.1  excites  me?! Sam  Saffron  

Slide 2

Slide 2 text

Who  am  I? Sam  Saffron   @samsaffron   samsaffron.com     Co-­‐founder  Discourse   Previously  at  Stack  Overflow   Premature  op?miser  

Slide 3

Slide 3 text

About  Discourse •  Next  genera?on  forum  soCware   •  Rails  4  and  Rails  master   •  Postgres  /  Redis   •  Ember.js   •  Open  Source   •  We  host  on  unicorn  /      nginx  /  haproxy  

Slide 4

Slide 4 text

My  Gems •  rack-­‐mini-­‐profiler:  visualize  performance  of  your  web  apps   •  flamegraph:  graphs  to  visualize  stack  traces   •  message_bus:  long  polling  support  for  rack  apps  and  group  messaging   •  fast_blank:  na?ve  rewrite  of  blank?  ,  free  perf  bump  for  rails  apps   •  lru_redux:  fastest  lru  cache  implementa?on  available  for  Ruby  

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

Agenda •  Is  Ruby  2.1  faster  than  2.0?   •  Tuning  Ruby  2.1   •  Tracking  memory  leaks  with  Ruby  2.1   •  Tracking  memory  usage  with  Ruby  2.1   •  Future  work  

Slide 7

Slide 7 text

Warning Ruby  2.1.0  is  not  ready  for  produc?on.     Many  issues  are  addressed  in  2.1.1  (not  yet  released  as  of  19  feb  2014)     SEGV  in  Rails,  broken  faraday  gem,  broken  excon  gem     Memory  usage  is  much  higher     -­‐  more  info  at:  h^p://blade.nagaokaut.ac.jp/cgi-­‐bin/scat.rb/ruby/ruby-­‐ core/59728  

Slide 8

Slide 8 text

Discourse  Bench •  Simulate  a  1000  topics  and  100  users   •  Look  at  various  pages  on  Discourse  both  as  admin  and  anon   •  Run  Discourse  spec  suite   •  Look  at  produc?on  vs  default  stacks   •  Run  bench  on  a  stable  cpu  bare  metal  (cpufreq-­‐u?l  sedng  to   performance)  

Slide 9

Slide 9 text

Discourse  bench

Slide 10

Slide 10 text

categories:      50:  89      75:  158      90:  160      99:  165   home:      50:  51      75:  54      90:  120      99:  123   topic:      50:  14      75:  15      90:  16      99:  85   ?mings:      load_rails:  3802   ruby-­‐version:  2.0.0-­‐p353   rss_kb:  146328   pss_kb:  142437   architecture:  amd64   opera?ngsystem:  Ubuntu   kernelversion:  3.8.0   memorysize:  23.55  GB   physicalprocessorcount:  1   processor0:  Intel(R)  Core(TM)  i7  CPU                  960    @ virtual:  phys     categories_admin:      50:  109      75:  175      90:  179      99:  207   home_admin:      50:  65      75:  129      90:  134      99:  138   topic_admin:      50:  24      75:  25      90:  27      99:  96   Sample   Results  

Slide 11

Slide 11 text

Ruby  2.1.0  vs  Ruby  2.0.0-­‐p353 0   50   100   150   200   250   300   350   400   450   RSS  (kb)   Heap  Used  (kb)   Post  GC  RVALUEs  (*1000)   Memory   2.0.0-­‐p353   2.1.0  

Slide 12

Slide 12 text

Page  load  Ime 0   20   40   60   80   100   120   140   160   180   200   Home   Home  (admin)   Categories   Categories  (admin)   2.0  median   2.1  median   2.0  90th  percen?le   2.1  90th  percen?le  

Slide 13

Slide 13 text

Summary  (out-­‐of-­‐the-­‐box  setup) •  2.1.0  consumes  much  more  memory  (75%  RSS  increase)     •  2.1.0  is  faster  for  median  (1-­‐2%  faster)   •  2.1.0  is  significantly  faster  for  90th  percen?le  (in  some  cases  60%+)  

Slide 14

Slide 14 text

Key  Performance  improvements •  Restricted  Genera?onal  GC  –  by  Koichi  Sasada  (RGenGC)   •  Granular  Global  Method  Cache  invalida?on  –  By  James  Golick  (ported   by  Charlie  Somerville  to  2.1)   •  Reduced  object  count  on  boot  –  by  Aman  Gupta   •  Addi?onal  GC  tuning  ENV  vars  –  Aman  and  Koichi   •  Frozen  string  cache  –  Charlie  and  Koichi  

Slide 15

Slide 15 text

ProducIon  opImisaIons •  Custom  Ruby  builds   •  GC  tuning   •  Unicorn  +  OOBGC   •  LD_PRELOAD=jemalloc  

Slide 16

Slide 16 text

GitHub  ruby  2.1.0  build •  Patch  set  managed  by  Aman  Gupta  (Ruby-­‐core  developer)   •  Used  at  GitHub  in  produc?on   •  Contains  fixes  for  all  urgent  2.1.0  issue  found  to  date   •  Contains  performance  patch  sets,  notably  vastly  improved  method   cache  by  funny-­‐falcon   •  5-­‐10%  faster  across  the  board  for  Discourse  bench   •  h^ps://github.com/github/ruby  

Slide 17

Slide 17 text

GitHub  2.1  GC  tuning •  RUBY_GC_HEAP_INIT_SLOTS=600000   •  Avoids  ini?al  expansion  of  heaps,  cuts  down  GC  on  startup   •  RUBY_GC_HEAP_FREE_SLOTS=600000   •  Ensure  enough  free  heap  space  for  a  large  amount  of  reqs  (4096  by  default)   •  RUBY_GC_HEAP_GROWTH_FACTOR=1.25   •  Grow  heaps  slower  (1.8  by  default)   •  RUBY_GC_HEAP_GROWTH_MAX_SLOTS=300000     •  Cap  heap  growth  (not  set  by  default)  

Slide 18

Slide 18 text

GC  tuning  effect  on  bench •  Performance  decrease  L  (~5%)   •  Heap  size  much  reduced  J   •  RSS  decrease:  266MB  -­‐>  241MB     •  S?ll  faster  than  Ruby  2.1.0  stock  

Slide 19

Slide 19 text

Out-­‐of-­‐band  GC •  Use  gctools  gem   •  gctools  also  contains  a  GC  tracer  (print  lazy  sweep  vs  minor  vs  major)   •  Invoke  GC::OOB.run  aCer  every  request  (post  process)   •  Works  with  unicorn,  may  work  with  passenger  in  future   •  GC  is  NOT  disabled,  no  need  for  unicorn  killers  etc.  

Slide 20

Slide 20 text

OOBGC  impact 0   10   20   30   40   50   60   70   80   90   100   Home  Page  (median)   Home  Page  (99th  percen?le)   Topic  Page  (median)   Topic  Page  (99th  percen?le)   2.1.0   2.1.0  +  OOBGC  

Slide 21

Slide 21 text

OOBGC  impact •  Similar  perf  for  median  requests   •  Significantly  reduced  PSS  –  child  memory  impact  (20-­‐30%  improved   for  3  children)   •  Significantly  be^er  perf  for  99%  percen?le.  Can  be  50%  faster.    

Slide 22

Slide 22 text

Reducing  RSS  further •  Ruby  2.1.1  (and  GitHub  branch)  introduce   RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR   •  Sedng  it  to  1.5  will  reduce  RSS  from  248Megs  -­‐>  197Megs     •  Will  impact  perf  a  bit  in  some  cases,  OOBGC  will  reduce  impact   •  You  can  disable  RGenGC  by  sedng  it  to  0.9   •  Combined  with  OOBGC  reduces  RSS  to  176  (130  PSS  for  3  workers)   •  That  is  only  slightly  higher  than  2.0!  (100  PSS)    

Slide 23

Slide 23 text

Ruby  2.1  OOBGC •  New  c  level  hooks  to  inform  on  GC  start  /  stop   •  New  c  level  extension  points  to  view  internal  state   •  Not  compa?ble  with  Ruby  2.0   •  Not  (yet)  compa?ble  with  passenger  –  PR  in  open  discussion  

Slide 24

Slide 24 text

jemalloc •  Replacement  malloc  library   •  If  you  make  install  from  source  injected  using  LD_PRELOAD=/usr/ local/lib/libjemalloc.so   •  6%  RSS  reduc?on  under  2.1.0  –  par?cularly  effec?ve  when  heaps  get   big   •  Similar  performance  to  glibc  malloc   •  Keeps  RSS  low  over  the  long  run   •  Other  op?on  is  tcmalloc  (used  by  GitHub  on  most  setups)  

Slide 25

Slide 25 text

Run  your  tests  faster  with  Ruby  2.1.0 •  2.1.0  is  mostly  compa?ble  with  2.0.0   •  2.1.0  GitHub  edi?on  avoids  nasty  segfaults  during  spec  runs   •  Thanks  to  Koichi  Sasada  we  no  longer  need  gc  malloc  voodoo  env  vars   to  make  specs  fast  

Slide 26

Slide 26 text

Run  your  tests  faster  with  Ruby  2.1.0 0   50   100   150   200   250   2.0.0-­‐p353   2.0.0-­‐p353   RUBY_GC_MALLOC_LIMIT=40000000   2.1.0   2.1.0  GitHub   Dura?on  seconds  (lower  be^er)   42%  Faster   42%  Faster   Run  ?me  in  seconds  for  Discourse  bench  

Slide 27

Slide 27 text

Tracking  a  memory  leak  with  Ruby  2.1 •  Ini?al  detec?on,  ps  reports  rss  increasing  over  ?me   •  Use  rbtrace  gem  (require  in  process  analysed,  low  impact  safe  in  prd)   •  Collect  heap  dump   •  Cause  leak   •  Collect  heap  dump   •  Compare  

Slide 28

Slide 28 text

CollecIng  a  heap  dump  with  rbtrace rbtrace  -­‐p  15193  -­‐e  'Thread.new{require   "objspace";   ObjectSpace.trace_object_allocations_star t;  GC.start();   ObjectSpace.dump_all(output:   File.open("heap.json","w"))}.join'  

Slide 29

Slide 29 text

Breaking  it  down

Slide 30

Slide 30 text

Causing  leak %  ab  –n  100  http://localhost:3000/  

Slide 31

Slide 31 text

Analyzing  results  

Slide 32

Slide 32 text

Results sam@ubuntu  discourse  %  ruby  script/diff_heaps.rb   heap.json  heap2.json   Leaked  10172  STRING  objects  at:  /home/sam/Source/ discourse/config/environments/development.rb:59    

Slide 33

Slide 33 text

Line  59:  development.rb  

Slide 34

Slide 34 text

Heap  dump  is  a  goldmine •  Contains  all  cross  object  references     •  Contains  all  roots   •  Encoding  for  strings   •  Bytesize   •  GC  informa?on  (oldgen  vs  new  objects)   •  Actual  string  values  

Slide 35

Slide 35 text

Improving  leak  detecIon •  Gather  3  snapshots     •  Remove  common  objects  in  snapshot  1  from  2   •  Remove  missing  objects  in  snapshot  3  from  2   •  GC.start  may  miss  some  objects   •  rbtrace  could  be  gathering  snapshots  during  requests  

Slide 36

Slide 36 text

Memory  profiling •  memory_profiler  gem   •  Uses  new  alloca?on  tracing  API   require  ‘objspace’   ObjectSpace.trace_object_alloca?ons  {}   ObjectSpace.alloca?on_sourceline   ObjectSpace.alloca?on_class_path   ObjectSpace.alloca?on_method_id  

Slide 37

Slide 37 text

Analysing  Rails  startup  with  memory  profiler

Slide 38

Slide 38 text

Retained  vs  Allocated •  Retained  are  objects  that  are  s?ll  around  aCer  measured  block   •  Allocated  are  objects  allocated  during  block  of  code   •  High  retained  =  increase  memory  use,  slower  major  GC   •  High  allocated  =  slower  perf,  increased  memory  use  

Slide 39

Slide 39 text

Report  Card  mime-­‐types-­‐1.25.1 •  Allocated  Memory:  3.4MB   •  Allocated  Objects:  68K   •  Retained  Memory:  ~0.85MB   •  Retained  Objects:  18474   •  “applica?on”  retained  2379  ?mes  types.rb  (line  421  /  426)  

Slide 40

Slide 40 text

A  simple  AcIve  Record  query 328  objects  allocated   User  has  48  columns  

Slide 41

Slide 41 text

“acts_like_?me?”  allocated  18  ?mes   0  alloca?ons!   ac?ve_support  

Slide 42

Slide 42 text

Allocates  7  strings  per  date!  

Slide 43

Slide 43 text

pg  gem  performs  no  casIng "f"  x  18          gems/pg-­‐0.15.1/lib/pg/result.rb:10  x  18     -­‐  pg  gem  returns  strings  for  dates,  booleans,  integers,  floats.     -­‐  Ac?veRecord  is  stuck  conver?ng  strings  to  na?ve  types  in  pure  Ruby   -­‐  Discussing  with  pg  gem  owners  a  fix  that  converts  types  in  c   extension  

Slide 44

Slide 44 text

Excess  allocaIon  of  String Called  with  symbol  

Slide 45

Slide 45 text

Resolving  allocaIons •  Op?on  1:  Rewrite  code  to  avoid  aCer_ini?alize  (preferred  solu?on)   •  Op?on  2:  Use  Ruby  2.1  fstring:  “string”.freeze   Ruby  2.1:  same  object  !!!   Ruby  2.0  

Slide 46

Slide 46 text

Awesome  performance  hooks •  Memory  profiling   •  Delayed  job  API   •  rb_postponed_job_register_one   •  Efficient  stack  traces   •  rb_profile_frames   •  C  level  trace  points   •  rb_tracepoint_new   •  Much  be^er  transparency  with  stats   •  GC::INTERNAL_CONSTANTS   •  GC.stat   •  GC.latest_gc_info  

Slide 47

Slide 47 text

Other  changes  in  2.1 •  Refinements   •  Decimal  Literals?  0.1r   •  Op?onal  keyword  arguments   •  Method  defini?on  returns  method  name   •  String#scrub   •  Excep?on#cause  

Slide 48

Slide 48 text

String#scrub

Slide 49

Slide 49 text

The  Future •  GC  improvement,  revised  OLDGEN  promo?on  strategy   •  Long  running  benchmarks  

Slide 50

Slide 50 text

Long  running  Ruby  benchmark

Slide 51

Slide 51 text

Summary •  Ruby  MRI  is  gedng  faster  and  easier  to  diagnose   •  You  can  take  advantage  of  the  new  interfaces  today   •  Hold  off  on  Ruby  2.1.0  in  produc?on,  2.1.1  will  be  safe   •  Don’t  apply  op?misa?ons  blindly.  ALWAYS  BE  MEASURING.  

Slide 52

Slide 52 text

QuesIons •  Contact  me  via  twi^er:  @samsaffron   •  Join  Ruby  rogues  parley  an  men?on  me   •  Ask  on  Stack  Overflow  and  send  me  a  link  

Slide 53

Slide 53 text

Resources •  Demys?fying  The  Ruby  GC:  h^p://samsaffron.com/archive/2013/11/22/demys?fying-­‐the-­‐ruby-­‐gc   •  Jemalloc  in  Ruby  core:  h^ps://bugs.ruby-­‐lang.org/issues/9113   •  Aman’s  blog  for  Ruby  2.1  news:  h^p://tmm1.net/   •  memory_profiler:  h^ps://github.com/SamSaffron/memory_profiler   •  rack_mini_profiler:  h^ps://github.com/miniprofiler/rack-­‐mini-­‐profiler   •  stackprof:  h^ps://github.com/tmm1/stackprof   •  flamegraph:  h^ps://github.com/samsaffron/flamegraph   •  Global  class  cache  change  to  Ruby  core:  h^ps://bugs.ruby-­‐lang.org/issues/9262     •  Changes  in  Ruby  2.1:  h^p://rkh.im/ruby-­‐2.1   •  Object  management  on  Ruby  2.1:  h^p://www.confreaks.com/videos/2866-­‐rubyconf2013-­‐object-­‐management-­‐on-­‐ruby-­‐2-­‐1   •  Raw  bench  stats:  h^ps://gist.github.com/SamSaffron/9029928   •  Call  to  ac?on  long  running  benchmark:  h^p://samsaffron.com/archive/2013/12/11/call-­‐to-­‐ac?on-­‐long-­‐running-­‐ruby-­‐benchmark   •  Discourse  bench  and  other  scripts:  h^ps://github.com/discourse/discourse/tree/master/script