Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to optimize Ruby internal.

Watson
September 18, 2017

How to optimize Ruby internal.

"Ruby 3" has aimed to optimize performance which is one of goals to release. I have made some patches to optimize Ruby internal to realize it.

This talk describes how optimized Ruby internal at Ruby 2.5.

Watson

September 18, 2017
Tweet

More Decks by Watson

Other Decks in Technology

Transcript

  1. How to optimize Ruby internal Shizuo Fujita

  2. Self • @watson1978 • Ubiregi Inc. • Ruby committer

  3. Ruby 3x3 • Ruby 3 need 3 times faster performance

    than Ruby 2
  4. About this talk • How to measure Ruby Internal •

    Idea to optimize Ruby internal • Optimize
  5. How to measure

  6. Prepare benchmark code hash1 = {aaa: 12, bbb: 34} hash2

    = {ccc: 56, ddd: 78} loop do hash1.merge(hash2) end
  7. Mesure $ iprofiler -timeprofiler ./miniruby ~/benchmark.rb hash1 = {aaa: 12,

    bbb: 34} hash2 = {ccc: 56, ddd: 78} loop do hash1.merge(hash2) end
  8. None
  9. None
  10. None
  11. None
  12. Idea to optimize Ruby internal

  13. Method execution... • Method dispatching • Look up constants /

    methods • Ruby method executing • Implemented in Hash/String/Array/Time...
  14. Dispatching Method execution

  15. Dispatching Method execution

  16. Dispatching Method execution

  17. Dispatching Method execution

  18. • Focused to reduce method execution time

  19. • Focused to reduce method execution time • Remove dispatching

    in method • Remove redundant allocations
  20. Remove dispatching Dispatching Method execution Dispatching Ruby method via rb_funcall()

  21. Remove dispatching Dispatching Method execution

  22. Remove redundant allocations BMMPDBUJPOT Dispatching Method execution

  23. Remove redundant allocations BMMPDBUJPOT BMMPDBUJPOT Dispatching Method execution

  24. Optimize

  25. Hash#merge • It has used rb_obj_dup() and it’s calling rb_funcall()

  26. rb_obj_dup() • It calls Object#initialize_dup via rb_funcall()

  27. rb_obj_dup() • It calls Object#initialize_dup via rb_funcall() • Replace rb_obj_dup()

    to something like rb_ary_dup() to remove redundant Object#initialize_dup
  28. Patch for Hash#merge

  29. Patch for Hash#merge • Replaced rb_obj_dup() to remove rb_funcall()

  30. Hash#merge performance Y Y Y Y Y Y 3VCZ 3VCZEFW

    Y Y hash1 = { "a" => 100, "b" => 200 } hash2 = { "b" => 254, "c" => 300 } hash1.merge(hash2)
  31. Patch for Time (1)

  32. Patch for Time (2)

  33. Patch for Time (3)

  34. Time methods • Time methods called Ruby methods via rb_funcall()

  35. Time methods • Time methods called Ruby methods via rb_funcall()

    • Added some internal APIs to call method directly
  36. Time#- performance Y Y Y Y Y Y 3VCZ 3VCZEFW

    Y Y Time.now - Time.at(0)
  37. Others

  38. Result (2.4.1 vs 2.5.0-dev) "SSBZ Y Y Y Y Y

    )BTI Y Y Y Y Y 4USJOH Y Y Y Y Y 5JNF Y Y Y Y Y Ubuntu 17.04 gcc version 7.0.1 ruby 2.5.0dev (2017-08-27 trunk 59665) [x86_64-linux]
  39. Top 10 4USJOH<OUI> PUIFS  4USJOHJOTFSU QPT PUIFS  "SSBZSBTTPD

    PCK  5JNFTVCTFD  5JNF[email protected]  5JNF[email protected]  )BTI[email protected] OPWBMVF  )BTIWBMVF OPWBMVF  5JNF[email protected]  "SSBZNBY O 
  40. Worst 10 "SSBZDZDMF O \cPCKcCMPDL^  "SSBZ[email protected]\cJOEFYc^  4USJOH[email protected] 

    "SSBZBOZ \cYcCMPDL^  "SSBZSJOEFY WBM OPUGPVOE  4USJOHMJOFT   5JNFVTFD  "SSBZ[email protected]\cYcCMPDL^  4USJOHMJOFT  \cMJOFc^  )BTIMJUFSBM 
  41. Worst 10 "SSBZDZDMF O \cPCKcCMPDL^  "SSBZ[email protected]\cJOEFYc^  4USJOH[email protected] 

    "SSBZBOZ \cYcCMPDL^  "SSBZSJOEFY WBM OPUGPVOE  4USJOHMJOFT   5JNFVTFD  "SSBZ[email protected]\cYcCMPDL^  4USJOHMJOFT  \cMJOFc^  )BTIMJUFSBM 
  42. Ruby 2.5.0-dev 35.7 % slow down

  43. 3FHSFTTJPOXBTpYFECZTIZPVIFJ

  44. One more thing…

  45. Hash • Hash object need to allocate some heap areas

  46. Hash Internal RBasic st_table * int VALUE RHash char char

    char int st_hash_type * st_index_t st_index_t * st_index_t st_index_t st_table_entry * st_table st_index_t st_index_t st_index_t st_index_t …. st_index_t [] st_table_entry st_table_entry …. st_table_entry st_table_entry []
  47. 4 allocations RBasic st_table * int VALUE RHash char char

    char int st_hash_type * st_index_t st_index_t * st_index_t st_index_t st_table_entry * st_table st_index_t st_index_t st_index_t st_index_t …. st_index_t [] st_table_entry st_table_entry …. st_table_entry st_table_entry []
  48. Reused & faster Slow allocating RBasic st_table * int VALUE

    RHash char char char int st_hash_type * st_index_t st_index_t * st_index_t st_index_t st_table_entry * st_table st_index_t st_index_t st_index_t st_index_t …. st_index_t [] st_table_entry st_table_entry …. st_table_entry st_table_entry []
  49. Always allocating RBasic st_table * int VALUE RHash char char

    char int st_hash_type * st_index_t st_index_t * st_index_t st_index_t st_table_entry * st_table st_index_t st_index_t st_index_t st_index_t …. st_index_t [] st_table_entry st_table_entry …. st_table_entry st_table_entry []
  50. …. …. …. …. Before After Concatenate heap areas 2

    allocations 1 allocation
  51. Hash literal performance Y Y Y Y Y Y Y

    Before After h = {foo: 12, bar: 34, baz: 56} Caution: This is just prototype
  52. vs. Ruby 2.4.1 After Ruby 2.4.1 Base : ruby 2.5.0dev

    (2017-09-10 trunk 59745) [x86_64-linux] Y Y Y Y Y Y Y Caution: This is just prototype h = {foo: 12, bar: 34, baz: 56}
  53. You might learn: • How to measure • Some ways

    to optimize effectively • A part of current Ruby-dev status
  54. Thank you !!