How to optimize Ruby internal.

6e8aca910e7ee095397d3b90acb25f6c?s=47 Watson
September 18, 2017

How to optimize Ruby internal.

"Ruby 3" has aimed to optimize performance which is one of goals to release. I have made some patches to optimize Ruby internal to realize it.

This talk describes how optimized Ruby internal at Ruby 2.5.

6e8aca910e7ee095397d3b90acb25f6c?s=128

Watson

September 18, 2017
Tweet

Transcript

  1. How to optimize Ruby internal Shizuo Fujita

  2. Self • @watson1978 • Ubiregi Inc. • Ruby committer

  3. Ruby 3x3 • Ruby 3 need 3 times faster performance

    than Ruby 2
  4. About this talk • How to measure Ruby Internal •

    Idea to optimize Ruby internal • Optimize
  5. How to measure

  6. Prepare benchmark code hash1 = {aaa: 12, bbb: 34} hash2

    = {ccc: 56, ddd: 78} loop do hash1.merge(hash2) end
  7. Mesure $ iprofiler -timeprofiler ./miniruby ~/benchmark.rb hash1 = {aaa: 12,

    bbb: 34} hash2 = {ccc: 56, ddd: 78} loop do hash1.merge(hash2) end
  8. None
  9. None
  10. None
  11. None
  12. Idea to optimize Ruby internal

  13. Method execution... • Method dispatching • Look up constants /

    methods • Ruby method executing • Implemented in Hash/String/Array/Time...
  14. Dispatching Method execution

  15. Dispatching Method execution

  16. Dispatching Method execution

  17. Dispatching Method execution

  18. • Focused to reduce method execution time

  19. • Focused to reduce method execution time • Remove dispatching

    in method • Remove redundant allocations
  20. Remove dispatching Dispatching Method execution Dispatching Ruby method via rb_funcall()

  21. Remove dispatching Dispatching Method execution

  22. Remove redundant allocations BMMPDBUJPOT Dispatching Method execution

  23. Remove redundant allocations BMMPDBUJPOT BMMPDBUJPOT Dispatching Method execution

  24. Optimize

  25. Hash#merge • It has used rb_obj_dup() and it’s calling rb_funcall()

  26. rb_obj_dup() • It calls Object#initialize_dup via rb_funcall()

  27. rb_obj_dup() • It calls Object#initialize_dup via rb_funcall() • Replace rb_obj_dup()

    to something like rb_ary_dup() to remove redundant Object#initialize_dup
  28. Patch for Hash#merge

  29. Patch for Hash#merge • Replaced rb_obj_dup() to remove rb_funcall()

  30. Hash#merge performance Y Y Y Y Y Y 3VCZ 3VCZEFW

    Y Y hash1 = { "a" => 100, "b" => 200 } hash2 = { "b" => 254, "c" => 300 } hash1.merge(hash2)
  31. Patch for Time (1)

  32. Patch for Time (2)

  33. Patch for Time (3)

  34. Time methods • Time methods called Ruby methods via rb_funcall()

  35. Time methods • Time methods called Ruby methods via rb_funcall()

    • Added some internal APIs to call method directly
  36. Time#- performance Y Y Y Y Y Y 3VCZ 3VCZEFW

    Y Y Time.now - Time.at(0)
  37. Others

  38. Result (2.4.1 vs 2.5.0-dev) "SSBZ Y Y Y Y Y

    )BTI Y Y Y Y Y 4USJOH Y Y Y Y Y 5JNF Y Y Y Y Y Ubuntu 17.04 gcc version 7.0.1 ruby 2.5.0dev (2017-08-27 trunk 59665) [x86_64-linux]
  39. Top 10 4USJOH<OUI> PUIFS  4USJOHJOTFSU QPT PUIFS  "SSBZSBTTPD

    PCK  5JNFTVCTFD  5JNFUP@J  5JNFUW@TFD  )BTIIBT@WBMVF OPWBMVF  )BTIWBMVF OPWBMVF  5JNFUP@S  "SSBZNBY O 
  40. Worst 10 "SSBZDZDMF O \cPCKcCMPDL^  "SSBZFBDI@JOEFY\cJOEFYc^  4USJOHUP@J 

    "SSBZBOZ \cYcCMPDL^  "SSBZSJOEFY WBM OPUGPVOE  4USJOHMJOFT   5JNFVTFD  "SSBZCTFBSDI@JOEFY\cYcCMPDL^  4USJOHMJOFT  \cMJOFc^  )BTIMJUFSBM 
  41. Worst 10 "SSBZDZDMF O \cPCKcCMPDL^  "SSBZFBDI@JOEFY\cJOEFYc^  4USJOHUP@J 

    "SSBZBOZ \cYcCMPDL^  "SSBZSJOEFY WBM OPUGPVOE  4USJOHMJOFT   5JNFVTFD  "SSBZCTFBSDI@JOEFY\cYcCMPDL^  4USJOHMJOFT  \cMJOFc^  )BTIMJUFSBM 
  42. Ruby 2.5.0-dev 35.7 % slow down

  43. 3FHSFTTJPOXBTpYFECZTIZPVIFJ

  44. One more thing…

  45. Hash • Hash object need to allocate some heap areas

  46. Hash Internal RBasic st_table * int VALUE RHash char char

    char int st_hash_type * st_index_t st_index_t * st_index_t st_index_t st_table_entry * st_table st_index_t st_index_t st_index_t st_index_t …. st_index_t [] st_table_entry st_table_entry …. st_table_entry st_table_entry []
  47. 4 allocations RBasic st_table * int VALUE RHash char char

    char int st_hash_type * st_index_t st_index_t * st_index_t st_index_t st_table_entry * st_table st_index_t st_index_t st_index_t st_index_t …. st_index_t [] st_table_entry st_table_entry …. st_table_entry st_table_entry []
  48. Reused & faster Slow allocating RBasic st_table * int VALUE

    RHash char char char int st_hash_type * st_index_t st_index_t * st_index_t st_index_t st_table_entry * st_table st_index_t st_index_t st_index_t st_index_t …. st_index_t [] st_table_entry st_table_entry …. st_table_entry st_table_entry []
  49. Always allocating RBasic st_table * int VALUE RHash char char

    char int st_hash_type * st_index_t st_index_t * st_index_t st_index_t st_table_entry * st_table st_index_t st_index_t st_index_t st_index_t …. st_index_t [] st_table_entry st_table_entry …. st_table_entry st_table_entry []
  50. …. …. …. …. Before After Concatenate heap areas 2

    allocations 1 allocation
  51. Hash literal performance Y Y Y Y Y Y Y

    Before After h = {foo: 12, bar: 34, baz: 56} Caution: This is just prototype
  52. vs. Ruby 2.4.1 After Ruby 2.4.1 Base : ruby 2.5.0dev

    (2017-09-10 trunk 59745) [x86_64-linux] Y Y Y Y Y Y Y Caution: This is just prototype h = {foo: 12, bar: 34, baz: 56}
  53. You might learn: • How to measure • Some ways

    to optimize effectively • A part of current Ruby-dev status
  54. Thank you !!