$30 off During Our Annual Pro Sale. View Details »

Ruby 3 JIT's roadmap / RubyConf China 2020

Ruby 3 JIT's roadmap / RubyConf China 2020

RubyConf China 2020 http://www.rubyconfchina.org/

Takashi Kokubun

August 15, 2020
Tweet

More Decks by Takashi Kokubun

Other Decks in Programming

Transcript

  1. Ruby 3 JIT's roadmap
    RubyConf China 2020
    Takashi Kokubun / @k0kubun

    View Slide

  2. Self introduction
    • GitHub, Twitter: @k0kubun
    • Treasure Data, Inc.
    • Ruby committer
    • JIT: 2017~2020

    View Slide

  3. GitHub Sponsors - Thank you!

    View Slide

  4. Agenda
    1. What can Ruby's JIT do?
    2. Ruby 3 JIT's roadmap
    3. Recent progress in Ruby 3
    4. Current challenges

    View Slide

  5. 1. What can Ruby's JIT do?

    View Slide

  6. Ruby JIT's architecture
    3VCZJOUFSQSFUFS
    +*5UISFBE
    3VCZUISFBE

    View Slide

  7. Ruby JIT's architecture
    3VCZJOUFSQSFUFS
    +*5UISFBE
    3VCZUISFBE
    .FUIPE
    DpMF

    View Slide

  8. Ruby JIT's architecture
    3VCZJOUFSQSFUFS
    +*5UISFBE
    3VCZUISFBE
    .FUIPE
    TPpMF
    $DPNQJMFS
    DpMF
    3VO

    View Slide

  9. Ruby JIT's architecture
    3VCZJOUFSQSFUFS
    +*5UISFBE
    3VCZUISFBE
    .FUIPE
    TPpMF
    $DPNQJMFS
    DpMF
    3VO
    -PBEBOEDBMM

    View Slide

  10. What we can do with Ruby's JIT
    • Optimize Ruby methods to native code for hot spots
    • Eliminate VM interpretation cost: SP / PC
    • Optimize based on what C compiler can know
    • Ruby VM-specific optimizations we implemented

    View Slide

  11. What we CAN'T do with Ruby's JIT
    • Optimize a short-running program
    • JIT needs time to optimize many methods
    • Things may be slower while a C compiler is running
    • Optimization based on native code generated by C compiler
    • Deoptimization based on native insn pointer / stack pointer

    View Slide

  12. Use case: Obviate micro optimizations
    4MPXFS 'BTUFS

    View Slide

  13. Use case: Obviate micro optimizations

    View Slide

  14. Use case: Obviate micro optimizations
    OVN[FSP
    OVN
    JUFSBUJPOTFD 3VCZ






    7. +*5

    View Slide

  15. 2. Ruby 3 JIT's roadmap

    View Slide

  16. Ruby JIT's goals
    • Optcarrot: 3x faster than Ruby 2.0
    • Sinatra, Rails: 10% throughput increase vs VM

    View Slide

  17. mame/optcarrot
    3VCZ
    3VCZ
    3VCZ
    GSBNFTTFD







    7. +*5

    View Slide

  18. benchmark-driver/sinatra
    3VCZ
    3VCZ
    3VCZ
    SFRVFTUTTFD







    7. +*5

    View Slide

  19. k0kubun/railsbench
    3VCZ
    3VCZ
    3VCZ
    SFRVFTUTTFD







    7. +*5

    View Slide

  20. What should we do?
    • Current status:
    • Programs like Optcarrot run faster
    • Sinatra, Rails are still slightly slower than no-JIT mode
    • Let’s take a look at each of major Ruby features and JIT core

    View Slide

  21. Ruby 3 JIT's roadmap
    1. Variables / Constants
    2. Method inlining
    3. Constant folding
    4. Object allocation
    5. Deoptimization
    6. Scalability

    View Slide

  22. 1. Variables / Constants
    • Local variables: ⚠
    • Instance variables: ✅
    • Global variables: ❌
    • Constants: ❌

    View Slide

  23. 2. Method inlining
    • Ruby method: ✅
    • C method: ✅
    • super: ⚠
    • yield: ⚠

    View Slide

  24. 3. Constant folding
    • VM-optimized instructions: ⚠
    • C method: ⚠

    View Slide

  25. 4. Object allocation
    • Stack allocation: ⚠
    • Static allocation: ❌

    View Slide

  26. 5. Deoptimization
    • Reduce safepoints: ✅
    • Zero-cost deoptimization: ❌

    View Slide

  27. 6. Scalability
    • Single-page code: ✅
    • Code size reduction: ⚠
    • JIT dispatch cost: ⚠

    View Slide

  28. 3. Recent progress in Ruby 3

    View Slide

  29. Decrease ICache misses
    • ICache: Instruction Cache
    • Sinatra / Rails spends a lot of time on ICache misses
    • And the amount is increased by JIT

    View Slide

  30. • VTune: VM, JIT
    VTune: mame/optcarrot - VM

    View Slide

  31. • VTune: VM, JIT
    VTune: mame/optcarrot - JIT

    View Slide

  32. VTune: benchmark-driver/sinatra - VM

    View Slide

  33. VTune: benchmark-driver/sinatra - JIT

    View Slide

  34. Decrease ICache misses
    • We implemented:
    • Deduplication of the same code
    • Hot / cold partitioning

    View Slide

  35. Decrease ICache misses

    View Slide

  36. Merge type checks on ivar access
    • Instance variable index is class-specific
    • We can check class only once per method

    View Slide

  37. Merge type checks on ivar access
    30CKFDU
    qBHT FNCFEGBMTF

    OVNJW
    JWQUSˠIFBQ
    JW@JOEFY@UCM
    30CKFDU
    qBHT FNCFEUSVF

    BSZ<>
    BSZ<>
    BSZ<>

    View Slide

  38. Merge type checks on ivar access
    3VCZ
    3VCZ
    GSBNFTTFD





    7. +*5

    View Slide

  39. Inline C method call
    • Inlining C method had been hard because of:
    • Difficulty of detecting whether it’s safe to omit a call frame or not
    • Lots of indirection between method call and actual C function

    View Slide

  40. Inline C method call
    • We introduced a new type of method definition in CRuby core,
    called “builtin method”

    View Slide

  41. Inline C method call
    • We also added a way to “annotate” a C function in “builtin method”
    • Now we can say it’s safe to inline a C function

    View Slide

  42. Inline C method call
    ,FSOFMDMBTT
    JT





    7. +*5 +*5
    JOMJOJOH
    1.7x
    1.3x

    View Slide

  43. Inline C method call
    YJT'JYOVN

    View Slide

  44. 4. Current challenges

    View Slide

  45. Allow exception on "inline" method
    • A method which raises an exception can't have “inline"
    • But many methods raise:
    • TypeError
    • NoMemoryError
    • We could lazily update backtrace and others?

    View Slide

  46. Optimize VM -> JIT call
    • VM -> JIT call is slower than VM -> VM call
    • We might be able to offset icache miss's slowness by this
    • Prepare a fastpath / VM insn specialized for JIT call

    View Slide

  47. Optimize VM -> JIT call
    def foo3
    nil
    end
    def foo2
    nil
    end
    def foo1
    nil
    end
    Time to call a method returning nil
    (ns)
    0
    8
    16
    24
    32
    Number of called methods
    1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97
    VM JIT

    View Slide

  48. Improve inlining decision
    • Rails has polymorphic methods
    • If inlined by a specific caller, class can be specific for each caller

    View Slide

  49. Improve inlining decision

    View Slide

  50. … and many more things in the roadmap
    • Inline `super` and `yield`
    • Optimize local variables
    • Optimize constants
    • …

    View Slide

  51. Summary
    • We reviewed Ruby 3 JIT's roadmap and what we've implemented.
    • While it's not useful for Rails yet, we've had progress towards that.

    View Slide