Upgrade to Pro — share decks privately, control downloads, hide ads and more …

On sending methods

Urabe Shyouhei
September 04, 2020

On sending methods

Urabe Shyouhei

September 04, 2020
Tweet

More Decks by Urabe Shyouhei

Other Decks in Technology

Transcript

  1. About myself • Urabe, Shyouhei • @shyouhei • Ruby committer

    @ Money Forward, Inc. • 2020’s #2 most active `ruby/ruby` committer, according to GitHub.
  2. Samples: 39K of event 'cycles', Event count (approx.): 499087050327 Overhead

    Command Shared Object Symbol 24.91% thread_pool.rb* ruby [.] vm_exec_core 2.90% thread_pool.rb* ruby [.] vm_call_iseq_setup 2.59% thread_pool.rb* ruby [.] method_entry_get 2.52% thread_pool.rb* ruby [.] vm_call_cfunc_with_frame.isra.0 2.48% thread_pool.rb* ruby [.] st_lookup 2.47% thread_pool.rb* ruby [.] rb_id_table_lookup 2.15% thread_pool.rb* ruby [.] vm_call_iseq_setup_normal_0start_0params_0local 1.82% thread_pool.rb* ruby [.] rb_wb_protected_newobj_of 1.71% thread_pool.rb* ruby [.] find_table_entry_ind 1.70% thread_pool.rb* ruby [.] gc_sweep_step 1.47% thread_pool.rb* ruby [.] vm_call_iseq_setup_normal_0start_1params_1local 1.30% thread_pool.rb* ruby [.] find_entry 1.17% thread_pool.rb* ruby [.] setup_parameters_complex 1.11% thread_pool.rb* libc-2.31.so [.] malloc 1.03% thread_pool.rb* ruby [.] rgengc_check_relation.part.0 0.93% thread_pool.rb* ruby [.] rb_obj_is_kind_of 0.92% thread_pool.rb* libc-2.31.so [.] malloc_usable_size 0.90% thread_pool.rb* ruby [.] ruby_sip_hash13 0.89% thread_pool.rb* ruby [.] gc_mark_children 0.89% thread_pool.rb* ruby [.] vm_call_method_each_type 0.81% thread_pool.rb* ruby [.] vm_search_method.isra.0 0.75% thread_pool.rb* ruby [.] vm_call_ivar 0.67% thread_pool.rb* ruby [.] match_at
  3. So what is vm_call_iseq_setup? zsh % global -xr vm_call_iseq_setup vm_call_iseq_setup

    134 vm_eval.c vm_call_iseq_setup(ec, reg_cfp, calli vm_call_iseq_setup 2757 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_iseq_setup, T vm_call_iseq_setup 2758 vm_insnhelper.c return vm_call_iseq_setup(ec, cfp, callin zsh %
  4. So what is vm_call_iseq_setup? • It sets up VM stack

    frame, right at the moment when a method is about to run. • This function is not designed to be called frequently. • Normal method calls must skip it, because there are inline method caches.
  5. Inline method cache == disasm: #<ISeq:foo@-e:1 (1,0)-(1,21)> (catch: FALSE) 0000

    putself ( 1)[LiCa 0001 putself 0002 opt_send_without_block <calldata!mid:pp, argc:1, FCALL|ARGS_SIMPLE> 0004 leave [Re]
  6. zsh % global -xr CC_SET_FASTPATH CC_SET_FASTPATH 1635 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_iseq_setup_func(ci,

    param_size, CC_SET_FASTPATH 2239 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_iseq_setup, TRUE); CC_SET_FASTPATH 2244 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_cfunc, TRUE); CC_SET_FASTPATH 2251 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_attrset, !((ci->flag & VM_CALL_AR CC_SET_FASTPATH 2258 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_ivar, !(ci->flag & VM_CALL_ARGS_ CC_SET_FASTPATH 2263 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_method_missing, TRUE); CC_SET_FASTPATH 2267 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_bmethod, TRUE); CC_SET_FASTPATH 2278 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_opt_send, TRUE); CC_SET_FASTPATH 2281 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_opt_call, TRUE); CC_SET_FASTPATH 2284 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_opt_block_call, TRUE); CC_SET_FASTPATH 2361 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_method_missing, TRUE); CC_SET_FASTPATH 2382 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_method_missing, TRUE); CC_SET_FASTPATH 2492 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_method_missing, TRUE); CC_SET_FASTPATH 2497 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_super_method, TRUE); What is cc->call() ?
  7. In short… • `vm_call_iseq_setup` is called when inline method cache

    mishits. • This function to show up in the #2 most frequent operation in a profiler output means the cache mishits very frequently. • This is of course not by design!
  8. Cache hit rate • Recap `vm_search_method`, where you see there

    already are cache hit rate counters. • When enabled, we get this:
  9. [RUBY_DEBUG_COUNTER] 42122 killed by signal. [RUBY_DEBUG_COUNTER] mc_inline_hit 928210882 [RUBY_DEBUG_COUNTER] mc_inline_miss

    93975080 [RUBY_DEBUG_COUNTER] mc_global_hit 226048406 [RUBY_DEBUG_COUNTER] mc_global_miss 24044279 [RUBY_DEBUG_COUNTER] mc_global_state_miss 9246312 [RUBY_DEBUG_COUNTER] mc_class_serial_miss 84728768 [RUBY_DEBUG_COUNTER] mc_cme_complement 40695315 [RUBY_DEBUG_COUNTER] mc_cme_complement_hit 40690629 [RUBY_DEBUG_COUNTER] mc_search_super 242234049 [RUBY_DEBUG_COUNTER] mc_miss_by_nome 35088 [RUBY_DEBUG_COUNTER] mc_miss_by_distinct 31311981 [RUBY_DEBUG_COUNTER] mc_miss_by_refine 0 [RUBY_DEBUG_COUNTER] mc_miss_by_visi 1002898 [RUBY_DEBUG_COUNTER] mc_miss_spurious 61625113 [RUBY_DEBUG_COUNTER] ccf_general 37693318 [RUBY_DEBUG_COUNTER] ccf_iseq_setup 105021309
  10. Cache hit rate • Recap `vm_search_method`, where you see there

    already are cache hit rate counters. • When enabled, we get this: 928,210,882 928,210,882 + 93,975,080 ≅ 90.80%
  11. Deep dive into cache mishits • Now that we know

    the cache mechanism is somehow not working… • In order to fix it we have to know what exactly is going on when a cache mishits.
  12. zsh % LC_ALL=C bundle exec script/bench.rb 2>&1 | sort |

    uniq -c | sort -nr 14894134 cache eviction: ActiveModel::AttributeSet#fetch_value is called from activerecord-6.0.3.1/lib/active_record/attribute_methods/read.rb:39 4139225 cache eviction: Concurrent::Map#[] is called from concurrent-ruby-1.1.6/lib/concurrent-ruby/concurrent/map.rb:133 3912799 cache eviction: Topic#_read_attribute is called from activerecord-6.0.3.1/lib/active_record/attribute_methods/read.rb:17 3329993 cache eviction: User#_read_attribute is called from activerecord-6.0.3.1/lib/active_record/attribute_methods/primary_key.rb:19 1739540 cache eviction: User#_read_attribute is called from activerecord-6.0.3.1/lib/active_record/attribute_methods/read.rb:17 1722911 cache eviction: ActionView::OutputBuffer#<< is called from actionview-6.0.3.1/lib/action_view/buffers.rb:29 1670132 cache eviction: Module#site_id_lookup is called from message_bus-3.3.1/lib/message_bus/distributed_cache.rb:154 1285677 cache eviction: ActiveRecord::Type::HashLookupTypeMap#fetch is called from activerecord-6.0.3.1/lib/active_record/connection_adapters/postgresql_adapte 1158602 cache eviction: Post#_read_attribute is called from activerecord-6.0.3.1/lib/active_record/attribute_methods/read.rb:17 1111963 cache eviction: ActionDispatch::Journey::Visitors::Each#visit is called from actionpack-6.0.3.1/lib/action_dispatch/journey/visitors.rb:162 1079094 cache eviction: NilClass#blank? is called from activesupport-6.0.3.1/lib/active_support/core_ext/object/blank.rb:26 973103 cache eviction: ActiveRecord::Reflection::ThroughReflection#options is called from activerecord-6.0.3.1/lib/active_record/reflection.rb:806 973103 cache eviction: ActiveRecord::Reflection::ThroughReflection#active_record is called from activerecord-6.0.3.1/lib/active_record/reflection.rb:806 940974 cache eviction: Arel::Visitors::PostgreSQL#visit_Arel_Attributes_Attribute is called from activerecord-6.0.3.1/lib/arel/visitors/visitor.rb:30 740641 cache eviction: Post#_read_attribute is called from activerecord-6.0.3.1/lib/active_record/associations/belongs_to_association.rb:119 686061 cache eviction: Topic#_read_attribute is called from activerecord-6.0.3.1/lib/active_record/attribute_methods/primary_key.rb:19 675719 cache eviction: ActiveRecord::ConnectionAdapters::PostgreSQLAdapter#_quote is called from activerecord-6.0.3.1/lib/active_record/connection_adapters/po 610875 cache eviction: User#respond_to? is called from activerecord-6.0.3.1/lib/active_record/attribute_methods.rb:212 609323 cache eviction: Class#type_for_attribute is called from activerecord-6.0.3.1/lib/active_record/table_metadata.rb:33 604621 cache eviction: Topic#_read_attribute is called from activerecord-6.0.3.1/lib/active_record/attribute_methods/read.rb:32 555429 cache eviction: User#read_attribute is called from activerecord-6.0.3.1/lib/active_record/attribute_methods.rb:323 555429 cache eviction: User#_read_attribute is called from activerecord-6.0.3.1/lib/active_record/attribute_methods/read.rb:32 544500 cache eviction: Arel::Visitors::PostgreSQL#visit_Arel_Nodes_BindParam is called from activerecord-6.0.3.1/lib/arel/visitors/visitor.rb:30 541906 cache eviction: ActiveModel::Type::Integer#cast is called from activemodel-6.0.3.1/lib/active_model/type/helpers/numeric.rb:19
  13. So? • `ActiveRecord::AttributeMethods::Read` is… • A Module. • Included from

    (`discourse`’s) `User` etc. • Which means… • Its `#read_attribute` is shared across several classes.
  14. Actual scenario 1. `User#read_attribute` is called. 1. Runs `AR::AM::Read#read_attribute` 2.

    Populates its call cache with `User#_read_attribute` 2. `Post#read_attribute` is called. 1. Runs `AR::AM::Read#read_attribute` 2. Evicts previous `User#_read_attribute` entry and populates its own `Post#_read_attribute` 3. `User#read_attribute` is called. 1. Runs `AR::AM::Read#read_attribute` 2. Evicts previous `Post#_read_attribute` entry and populates its own `User#_read_attribute`
  15. Recap the problem • Inline method cache mishits too frequently.

    • Mishits happen when a method is shared among several classes possibly due to: • Inheritance • Module inclusion
  16. How do we tackle this? • Inline call cache to

    consider class hierarchy? • NG; there are refinements. Method resolution can vary from call site to call site. Class hierarchy is not suitable for the needs. • Also, it is dead slow to scan inheritance tree every time a method is called. • Remember (class, method) tuple for each and every call site • → Way to go.
  17. The new design • An inline cache was a map

    from (class, symbol) tuple to a method body. We modified it to stores multiple classes, which share the identical method name for identical method body. • Ko1 told me that this is a variant of polymorphic inline caching. • Hölzle et al., “Optimizing dynamically-typed object-oriented languages with polymorphic inline caches” in Proceedings of ECOOP ’91. LNCS 512. https://doi.org/10.1007/BFb0057013
  18. Samples: 39K of event 'cycles', Event count (approx.): 499087050327 Overhead

    Command Shared Object Symbol 24.91% thread_pool.rb* ruby [.] vm_exec_core 2.90% thread_pool.rb* ruby [.] vm_call_iseq_setup 2.59% thread_pool.rb* ruby [.] method_entry_get 2.52% thread_pool.rb* ruby [.] vm_call_cfunc_with_frame.isra.0 2.48% thread_pool.rb* ruby [.] st_lookup 2.47% thread_pool.rb* ruby [.] rb_id_table_lookup 2.15% thread_pool.rb* ruby [.] vm_call_iseq_setup_normal_0start_0params_0local 1.82% thread_pool.rb* ruby [.] rb_wb_protected_newobj_of 1.71% thread_pool.rb* ruby [.] find_table_entry_ind 1.70% thread_pool.rb* ruby [.] gc_sweep_step 1.47% thread_pool.rb* ruby [.] vm_call_iseq_setup_normal_0start_1params_1local 1.30% thread_pool.rb* ruby [.] find_entry 1.17% thread_pool.rb* ruby [.] setup_parameters_complex 1.11% thread_pool.rb* libc-2.31.so [.] malloc 1.03% thread_pool.rb* ruby [.] rgengc_check_relation.part.0 0.93% thread_pool.rb* ruby [.] rb_obj_is_kind_of 0.92% thread_pool.rb* libc-2.31.so [.] malloc_usable_size 0.90% thread_pool.rb* ruby [.] ruby_sip_hash13 0.89% thread_pool.rb* ruby [.] gc_mark_children 0.89% thread_pool.rb* ruby [.] vm_call_method_each_type 0.81% thread_pool.rb* ruby [.] vm_search_method.isra.0 0.75% thread_pool.rb* ruby [.] vm_call_ivar 0.67% thread_pool.rb* ruby [.] match_at
  19. Samples: 40K of event 'cycles', Event count (approx.): 502933506043 Overhead

    Command Shared Object Symbol 25.22% thread_pool.rb* ruby [.] vm_exec_core 2.61% thread_pool.rb* ruby [.] vm_call_iseq_setup_normal_0start_0params_0local 2.60% thread_pool.rb* ruby [.] vm_call_cfunc 2.56% thread_pool.rb* ruby [.] rb_st_lookup 2.16% thread_pool.rb* ruby [.] gc_sweep_step 2.12% thread_pool.rb* ruby [.] vm_call_iseq_setup 2.06% thread_pool.rb* ruby [.] find_table_entry_ind 2.04% thread_pool.rb* ruby [.] method_entry_get 1.61% thread_pool.rb* ruby [.] vm_call_iseq_setup_normal_0start_1params_1local 1.58% thread_pool.rb* ruby [.] rb_id_table_lookup 1.56% thread_pool.rb* libc-2.31.so [.] malloc_usable_size 0.98% thread_pool.rb* ruby [.] rb_obj_is_kind_of 0.87% thread_pool.rb* ruby [.] vm_call_ivar 0.86% thread_pool.rb* ruby [.] ruby_sip_hash13 0.76% thread_pool.rb* ruby [.] gc_mark_children 0.74% thread_pool.rb* ruby [.] rgengc_check_relation.part.0 0.71% thread_pool.rb* ruby [.] rb_call0 0.71% thread_pool.rb* ruby [.] rb_get_alloc_func 0.69% thread_pool.rb* ruby [.] vm_call_iseq_setup_normal_opt_start 0.68% thread_pool.rb* ruby [.] match_at 0.68% thread_pool.rb* ruby [.] rb_vm_search_method_slowpath 0.66% thread_pool.rb* ruby [.] rb_wb_protected_newobj_of 0.62% thread_pool.rb* ruby [.] gc_mark_ptr
  20. # Event 'cycles' # # Baseline Delta Abs Shared Object

    Symbol # ........ ......... .................. .................................................... # 2.39% -0.75% ruby [.] rb_id_table_lookup 25.90% -0.34% ruby [.] vm_exec_core 2.35% -0.27% ruby [.] method_entry_get 0.88% -0.19% ruby [.] rb_vm_search_method_slowpath 0.15% +0.17% ruby [.] check_match 1.98% +0.16% ruby [.] find_table_entry_ind 0.57% +0.15% ruby [.] rb_call0 0.80% -0.15% libc-2.31.so [.] malloc +0.14% ruby [.] vm_search_method_fastpath 0.45% +0.13% ruby [.] vm_callee_setup_arg 2.52% +0.11% ruby [.] vm_call_iseq_setup_normal_0start_0params_0locals 0.41% +0.10% ruby [.] rb_vm_exec 0.61% +0.10% ruby [.] rb_get_alloc_func 0.60% +0.09% ruby [.] match_at 0.50% +0.08% ruby [.] vm_call_method_each_type 0.80% +0.08% ruby [.] vm_call_ivar 0.50% +0.07% ruby [.] rb_vm_invoke_bmethod 0.31% -0.07% ruby [.] tbl_update 0.64% -0.07% ruby [.] setup_parameters_complex 0.62% +0.07% ruby [.] rb_wb_protected_newobj_of
  21. 0 128 256 384 512 64 80 96 112 128

    144 160 Histgram Response time [msec] [ `discourse` | `script/bench.rb` | `home` ] result before after Faster
  22. Key result • Quantitative improvements: • Cache hit rate improved

    by 90.80% → 95.06% • Response time median decreased by 70msec → 69msec • Response time mode decreased by 69msec → 67msec • Profiler no longer considers `vm_call_iseq_setup` one of the hottest path. Specialised functions to show up instead.
  23. Conclusions • Found that the way interpreter caches method resolution

    does not work well with inheritances, situations like e.g. Rails. • Fixed that problem by storing multiple keys per call site. • Results in improved cache hit rate, actual Rails performance boost.
  24. Conclusions • The changeset has already merged into 2.7. •

    You can boost your Rails app TODAY by just using newer ruby!
  25. Future? • After 2.7 was released, inline cache mechanism (not

    only what I described today) was entirely discarded then rewritten by @ko1. • This was for Ractor (then-called Guild) migration. • https://bugs.ruby-lang.org/issues/16614 • This slowed things down a bit, according to him. • Would 2.7 be the fastest lethal ruby implementation we ever made? Watch out for Ruby 3! • It might be possible to re-implement polymorphic inline caches on top of Ractor. But cannot reuse this original design.