On sending methods

9d2f78236e45a335301ba1195026105d?s=47 Urabe Shyouhei
September 04, 2020

On sending methods

9d2f78236e45a335301ba1195026105d?s=128

Urabe Shyouhei

September 04, 2020
Tweet

Transcript

  1. Photo by Tomoyuki Kengaku On sending methods Urabe, Shyouhei

  2. About myself • Urabe, Shyouhei • @shyouhei • Ruby committer

    @ Money Forward, Inc. • 2020’s #2 most active `ruby/ruby` committer, according to GitHub.
  3. None
  4. Motivations Photo by Tomoyuki Kengaku

  5. One day, I was profiling `ruby` using `discourse/discourse`’s `script/bench.rb`. Then

    I found something…
  6. None
  7. Samples: 39K of event 'cycles', Event count (approx.): 499087050327 Overhead

    Command Shared Object Symbol 24.91% thread_pool.rb* ruby [.] vm_exec_core 2.90% thread_pool.rb* ruby [.] vm_call_iseq_setup 2.59% thread_pool.rb* ruby [.] method_entry_get 2.52% thread_pool.rb* ruby [.] vm_call_cfunc_with_frame.isra.0 2.48% thread_pool.rb* ruby [.] st_lookup 2.47% thread_pool.rb* ruby [.] rb_id_table_lookup 2.15% thread_pool.rb* ruby [.] vm_call_iseq_setup_normal_0start_0params_0local 1.82% thread_pool.rb* ruby [.] rb_wb_protected_newobj_of 1.71% thread_pool.rb* ruby [.] find_table_entry_ind 1.70% thread_pool.rb* ruby [.] gc_sweep_step 1.47% thread_pool.rb* ruby [.] vm_call_iseq_setup_normal_0start_1params_1local 1.30% thread_pool.rb* ruby [.] find_entry 1.17% thread_pool.rb* ruby [.] setup_parameters_complex 1.11% thread_pool.rb* libc-2.31.so [.] malloc 1.03% thread_pool.rb* ruby [.] rgengc_check_relation.part.0 0.93% thread_pool.rb* ruby [.] rb_obj_is_kind_of 0.92% thread_pool.rb* libc-2.31.so [.] malloc_usable_size 0.90% thread_pool.rb* ruby [.] ruby_sip_hash13 0.89% thread_pool.rb* ruby [.] gc_mark_children 0.89% thread_pool.rb* ruby [.] vm_call_method_each_type 0.81% thread_pool.rb* ruby [.] vm_search_method.isra.0 0.75% thread_pool.rb* ruby [.] vm_call_ivar 0.67% thread_pool.rb* ruby [.] match_at
  8. This is WRONG!

  9. The problem Photo by Tomoyuki Kengaku

  10. So what is vm_call_iseq_setup? zsh % global -xr vm_call_iseq_setup vm_call_iseq_setup

    134 vm_eval.c vm_call_iseq_setup(ec, reg_cfp, calli vm_call_iseq_setup 2757 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_iseq_setup, T vm_call_iseq_setup 2758 vm_insnhelper.c return vm_call_iseq_setup(ec, cfp, callin zsh %
  11. So what is vm_call_iseq_setup? • It sets up VM stack

    frame, right at the moment when a method is about to run. • This function is not designed to be called frequently. • Normal method calls must skip it, because there are inline method caches.
  12. Inline method cache def foo pp self end

  13. Inline method cache == disasm: #<ISeq:foo@-e:1 (1,0)-(1,21)> (catch: FALSE) 0000

    putself ( 1)[LiCa 0001 putself 0002 opt_send_without_block <calldata!mid:pp, argc:1, FCALL|ARGS_SIMPLE> 0004 leave [Re]
  14. None
  15. None
  16. None
  17. What is cc->call() ?

  18. zsh % global -xr CC_SET_FASTPATH CC_SET_FASTPATH 1635 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_iseq_setup_func(ci,

    param_size, CC_SET_FASTPATH 2239 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_iseq_setup, TRUE); CC_SET_FASTPATH 2244 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_cfunc, TRUE); CC_SET_FASTPATH 2251 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_attrset, !((ci->flag & VM_CALL_AR CC_SET_FASTPATH 2258 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_ivar, !(ci->flag & VM_CALL_ARGS_ CC_SET_FASTPATH 2263 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_method_missing, TRUE); CC_SET_FASTPATH 2267 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_bmethod, TRUE); CC_SET_FASTPATH 2278 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_opt_send, TRUE); CC_SET_FASTPATH 2281 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_opt_call, TRUE); CC_SET_FASTPATH 2284 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_opt_block_call, TRUE); CC_SET_FASTPATH 2361 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_method_missing, TRUE); CC_SET_FASTPATH 2382 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_method_missing, TRUE); CC_SET_FASTPATH 2492 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_method_missing, TRUE); CC_SET_FASTPATH 2497 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_super_method, TRUE); What is cc->call() ?
  19. None
  20. None
  21. None
  22. None
  23. None
  24. In short… • `vm_call_iseq_setup` is called when inline method cache

    mishits. • This function to show up in the #2 most frequent operation in a profiler output means the cache mishits very frequently. • This is of course not by design!
  25. Cache hit rate • Recap `vm_search_method`, where you see there

    already are cache hit rate counters:
  26. None
  27. Cache hit rate • Recap `vm_search_method`, where you see there

    already are cache hit rate counters. • When enabled, we get this:
  28. [RUBY_DEBUG_COUNTER] 42122 killed by signal. [RUBY_DEBUG_COUNTER] mc_inline_hit 928210882 [RUBY_DEBUG_COUNTER] mc_inline_miss

    93975080 [RUBY_DEBUG_COUNTER] mc_global_hit 226048406 [RUBY_DEBUG_COUNTER] mc_global_miss 24044279 [RUBY_DEBUG_COUNTER] mc_global_state_miss 9246312 [RUBY_DEBUG_COUNTER] mc_class_serial_miss 84728768 [RUBY_DEBUG_COUNTER] mc_cme_complement 40695315 [RUBY_DEBUG_COUNTER] mc_cme_complement_hit 40690629 [RUBY_DEBUG_COUNTER] mc_search_super 242234049 [RUBY_DEBUG_COUNTER] mc_miss_by_nome 35088 [RUBY_DEBUG_COUNTER] mc_miss_by_distinct 31311981 [RUBY_DEBUG_COUNTER] mc_miss_by_refine 0 [RUBY_DEBUG_COUNTER] mc_miss_by_visi 1002898 [RUBY_DEBUG_COUNTER] mc_miss_spurious 61625113 [RUBY_DEBUG_COUNTER] ccf_general 37693318 [RUBY_DEBUG_COUNTER] ccf_iseq_setup 105021309
  29. Cache hit rate • Recap `vm_search_method`, where you see there

    already are cache hit rate counters. • When enabled, we get this: 928,210,882 928,210,882 + 93,975,080 ≅ 90.80%
  30. The design

  31. None
  32. Conclusion • Our inline method cache mechanism is not working

    as intended.
  33. Inspection Photo by Tomoyuki Kengaku

  34. Deep dive into cache mishits • Now that we know

    the cache mechanism is somehow not working… • In order to fix it we have to know what exactly is going on when a cache mishits.
  35. None
  36. zsh % LC_ALL=C bundle exec script/bench.rb 2>&1 | sort |

    uniq -c | sort -nr 14894134 cache eviction: ActiveModel::AttributeSet#fetch_value is called from activerecord-6.0.3.1/lib/active_record/attribute_methods/read.rb:39 4139225 cache eviction: Concurrent::Map#[] is called from concurrent-ruby-1.1.6/lib/concurrent-ruby/concurrent/map.rb:133 3912799 cache eviction: Topic#_read_attribute is called from activerecord-6.0.3.1/lib/active_record/attribute_methods/read.rb:17 3329993 cache eviction: User#_read_attribute is called from activerecord-6.0.3.1/lib/active_record/attribute_methods/primary_key.rb:19 1739540 cache eviction: User#_read_attribute is called from activerecord-6.0.3.1/lib/active_record/attribute_methods/read.rb:17 1722911 cache eviction: ActionView::OutputBuffer#<< is called from actionview-6.0.3.1/lib/action_view/buffers.rb:29 1670132 cache eviction: Module#site_id_lookup is called from message_bus-3.3.1/lib/message_bus/distributed_cache.rb:154 1285677 cache eviction: ActiveRecord::Type::HashLookupTypeMap#fetch is called from activerecord-6.0.3.1/lib/active_record/connection_adapters/postgresql_adapte 1158602 cache eviction: Post#_read_attribute is called from activerecord-6.0.3.1/lib/active_record/attribute_methods/read.rb:17 1111963 cache eviction: ActionDispatch::Journey::Visitors::Each#visit is called from actionpack-6.0.3.1/lib/action_dispatch/journey/visitors.rb:162 1079094 cache eviction: NilClass#blank? is called from activesupport-6.0.3.1/lib/active_support/core_ext/object/blank.rb:26 973103 cache eviction: ActiveRecord::Reflection::ThroughReflection#options is called from activerecord-6.0.3.1/lib/active_record/reflection.rb:806 973103 cache eviction: ActiveRecord::Reflection::ThroughReflection#active_record is called from activerecord-6.0.3.1/lib/active_record/reflection.rb:806 940974 cache eviction: Arel::Visitors::PostgreSQL#visit_Arel_Attributes_Attribute is called from activerecord-6.0.3.1/lib/arel/visitors/visitor.rb:30 740641 cache eviction: Post#_read_attribute is called from activerecord-6.0.3.1/lib/active_record/associations/belongs_to_association.rb:119 686061 cache eviction: Topic#_read_attribute is called from activerecord-6.0.3.1/lib/active_record/attribute_methods/primary_key.rb:19 675719 cache eviction: ActiveRecord::ConnectionAdapters::PostgreSQLAdapter#_quote is called from activerecord-6.0.3.1/lib/active_record/connection_adapters/po 610875 cache eviction: User#respond_to? is called from activerecord-6.0.3.1/lib/active_record/attribute_methods.rb:212 609323 cache eviction: Class#type_for_attribute is called from activerecord-6.0.3.1/lib/active_record/table_metadata.rb:33 604621 cache eviction: Topic#_read_attribute is called from activerecord-6.0.3.1/lib/active_record/attribute_methods/read.rb:32 555429 cache eviction: User#read_attribute is called from activerecord-6.0.3.1/lib/active_record/attribute_methods.rb:323 555429 cache eviction: User#_read_attribute is called from activerecord-6.0.3.1/lib/active_record/attribute_methods/read.rb:32 544500 cache eviction: Arel::Visitors::PostgreSQL#visit_Arel_Nodes_BindParam is called from activerecord-6.0.3.1/lib/arel/visitors/visitor.rb:30 541906 cache eviction: ActiveModel::Type::Integer#cast is called from activemodel-6.0.3.1/lib/active_model/type/helpers/numeric.rb:19
  37. Something is happening on `active_record/attribute_methods/read.rb`!

  38. None
  39. None
  40. So? • `ActiveRecord::AttributeMethods::Read` is… • A Module. • Included from

    (`discourse`’s) `User` etc. • Which means… • Its `#read_attribute` is shared across several classes.
  41. User Post Topic AR::Base AR::AttributeMethods AR::AM::Read Inherit Include Include #read_attribute

    #_read_attribute Calls
  42. None
  43. Actual scenario 1. `User#read_attribute` is called. 1. Runs `AR::AM::Read#read_attribute` 2.

    Populates its call cache with `User#_read_attribute` 2. `Post#read_attribute` is called. 1. Runs `AR::AM::Read#read_attribute` 2. Evicts previous `User#_read_attribute` entry and populates its own `Post#_read_attribute` 3. `User#read_attribute` is called. 1. Runs `AR::AM::Read#read_attribute` 2. Evicts previous `Post#_read_attribute` entry and populates its own `User#_read_attribute`
  44. Fix it Photo by Tomoyuki Kengaku

  45. Recap the problem • Inline method cache mishits too frequently.

    • Mishits happen when a method is shared among several classes possibly due to: • Inheritance • Module inclusion
  46. How do we tackle this? • Inline call cache to

    consider class hierarchy? • NG; there are refinements. Method resolution can vary from call site to call site. Class hierarchy is not suitable for the needs. • Also, it is dead slow to scan inheritance tree every time a method is called. • Remember (class, method) tuple for each and every call site • → Way to go.
  47. The new design • An inline cache was a map

    from (class, symbol) tuple to a method body. We modified it to stores multiple classes, which share the identical method name for identical method body. • Ko1 told me that this is a variant of polymorphic inline caching. • Hölzle et al., “Optimizing dynamically-typed object-oriented languages with polymorphic inline caches” in Proceedings of ECOOP ’91. LNCS 512. https://doi.org/10.1007/BFb0057013
  48. None
  49. Expand here to store multiple serials.

  50. None
  51. None
  52. None
  53. Benchmarks Photo by Tomoyuki Kengaku

  54. Inline cache hit rate improvement before after mc_inline_hit 928,210,882 971,641,144

    mc_inline_miss 93,975,080 50,482,155 90.80% 95.06%
  55. Samples: 39K of event 'cycles', Event count (approx.): 499087050327 Overhead

    Command Shared Object Symbol 24.91% thread_pool.rb* ruby [.] vm_exec_core 2.90% thread_pool.rb* ruby [.] vm_call_iseq_setup 2.59% thread_pool.rb* ruby [.] method_entry_get 2.52% thread_pool.rb* ruby [.] vm_call_cfunc_with_frame.isra.0 2.48% thread_pool.rb* ruby [.] st_lookup 2.47% thread_pool.rb* ruby [.] rb_id_table_lookup 2.15% thread_pool.rb* ruby [.] vm_call_iseq_setup_normal_0start_0params_0local 1.82% thread_pool.rb* ruby [.] rb_wb_protected_newobj_of 1.71% thread_pool.rb* ruby [.] find_table_entry_ind 1.70% thread_pool.rb* ruby [.] gc_sweep_step 1.47% thread_pool.rb* ruby [.] vm_call_iseq_setup_normal_0start_1params_1local 1.30% thread_pool.rb* ruby [.] find_entry 1.17% thread_pool.rb* ruby [.] setup_parameters_complex 1.11% thread_pool.rb* libc-2.31.so [.] malloc 1.03% thread_pool.rb* ruby [.] rgengc_check_relation.part.0 0.93% thread_pool.rb* ruby [.] rb_obj_is_kind_of 0.92% thread_pool.rb* libc-2.31.so [.] malloc_usable_size 0.90% thread_pool.rb* ruby [.] ruby_sip_hash13 0.89% thread_pool.rb* ruby [.] gc_mark_children 0.89% thread_pool.rb* ruby [.] vm_call_method_each_type 0.81% thread_pool.rb* ruby [.] vm_search_method.isra.0 0.75% thread_pool.rb* ruby [.] vm_call_ivar 0.67% thread_pool.rb* ruby [.] match_at
  56. Samples: 40K of event 'cycles', Event count (approx.): 502933506043 Overhead

    Command Shared Object Symbol 25.22% thread_pool.rb* ruby [.] vm_exec_core 2.61% thread_pool.rb* ruby [.] vm_call_iseq_setup_normal_0start_0params_0local 2.60% thread_pool.rb* ruby [.] vm_call_cfunc 2.56% thread_pool.rb* ruby [.] rb_st_lookup 2.16% thread_pool.rb* ruby [.] gc_sweep_step 2.12% thread_pool.rb* ruby [.] vm_call_iseq_setup 2.06% thread_pool.rb* ruby [.] find_table_entry_ind 2.04% thread_pool.rb* ruby [.] method_entry_get 1.61% thread_pool.rb* ruby [.] vm_call_iseq_setup_normal_0start_1params_1local 1.58% thread_pool.rb* ruby [.] rb_id_table_lookup 1.56% thread_pool.rb* libc-2.31.so [.] malloc_usable_size 0.98% thread_pool.rb* ruby [.] rb_obj_is_kind_of 0.87% thread_pool.rb* ruby [.] vm_call_ivar 0.86% thread_pool.rb* ruby [.] ruby_sip_hash13 0.76% thread_pool.rb* ruby [.] gc_mark_children 0.74% thread_pool.rb* ruby [.] rgengc_check_relation.part.0 0.71% thread_pool.rb* ruby [.] rb_call0 0.71% thread_pool.rb* ruby [.] rb_get_alloc_func 0.69% thread_pool.rb* ruby [.] vm_call_iseq_setup_normal_opt_start 0.68% thread_pool.rb* ruby [.] match_at 0.68% thread_pool.rb* ruby [.] rb_vm_search_method_slowpath 0.66% thread_pool.rb* ruby [.] rb_wb_protected_newobj_of 0.62% thread_pool.rb* ruby [.] gc_mark_ptr
  57. # Event 'cycles' # # Baseline Delta Abs Shared Object

    Symbol # ........ ......... .................. .................................................... # 2.39% -0.75% ruby [.] rb_id_table_lookup 25.90% -0.34% ruby [.] vm_exec_core 2.35% -0.27% ruby [.] method_entry_get 0.88% -0.19% ruby [.] rb_vm_search_method_slowpath 0.15% +0.17% ruby [.] check_match 1.98% +0.16% ruby [.] find_table_entry_ind 0.57% +0.15% ruby [.] rb_call0 0.80% -0.15% libc-2.31.so [.] malloc +0.14% ruby [.] vm_search_method_fastpath 0.45% +0.13% ruby [.] vm_callee_setup_arg 2.52% +0.11% ruby [.] vm_call_iseq_setup_normal_0start_0params_0locals 0.41% +0.10% ruby [.] rb_vm_exec 0.61% +0.10% ruby [.] rb_get_alloc_func 0.60% +0.09% ruby [.] match_at 0.50% +0.08% ruby [.] vm_call_method_each_type 0.80% +0.08% ruby [.] vm_call_ivar 0.50% +0.07% ruby [.] rb_vm_invoke_bmethod 0.31% -0.07% ruby [.] tbl_update 0.64% -0.07% ruby [.] setup_parameters_complex 0.62% +0.07% ruby [.] rb_wb_protected_newobj_of
  58. 0 128 256 384 512 64 80 96 112 128

    144 160 Histgram Response time [msec] [ `discourse` | `script/bench.rb` | `home` ] result before after Faster
  59. Key result • Quantitative improvements: • Cache hit rate improved

    by 90.80% → 95.06% • Response time median decreased by 70msec → 69msec • Response time mode decreased by 69msec → 67msec • Profiler no longer considers `vm_call_iseq_setup` one of the hottest path. Specialised functions to show up instead.
  60. Conclusions Photo by Tomoyuki Kengaku

  61. Conclusions • Found that the way interpreter caches method resolution

    does not work well with inheritances, situations like e.g. Rails. • Fixed that problem by storing multiple keys per call site. • Results in improved cache hit rate, actual Rails performance boost.
  62. None
  63. Conclusions • The changeset has already merged into 2.7. •

    You can boost your Rails app TODAY by just using newer ruby!
  64. Future works Photo by Tomoyuki Kengaku

  65. None
  66. Future? • After 2.7 was released, inline cache mechanism (not

    only what I described today) was entirely discarded then rewritten by @ko1. • This was for Ractor (then-called Guild) migration. • https://bugs.ruby-lang.org/issues/16614 • This slowed things down a bit, according to him. • Would 2.7 be the fastest lethal ruby implementation we ever made? Watch out for Ruby 3! • It might be possible to re-implement polymorphic inline caches on top of Ractor. But cannot reuse this original design.
  67. None