Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Charles Nutter — Dynamic Languages on the JVM: Are We There Yet?

Moscow JUG
October 04, 2018

Charles Nutter — Dynamic Languages on the JVM: Are We There Yet?

It’s been over a decade since JRuby first compiled Ruby code to the JVM.

Has the dream of a dynamic-language JVM arrived yet? We’ll review techniques used by JRuby and other dynamic languages and see how well various JVM technologies are working to make those languages fast and efficient.

Moscow JUG

October 04, 2018
Tweet

More Decks by Moscow JUG

Other Decks in Programming

Transcript

  1. Dynamic Languages on JVM
    Are We There Yet?

    View Slide

  2. Intro
    • Charles Oliver Nutter

    • @headius

    [email protected]

    • JRuby co-lead since 2006

    • Split time between dev and community work

    • Red Hat "research and prototyping" group

    View Slide

  3. What's In This Talk
    • Dynamic language challenges on the JVM

    • JRuby as case study for optimizing dynlangs

    • Current and future solutions

    • A few interesting benchmarks

    • The future of dynamic languages on JVM

    View Slide

  4. Dynamic Languages

    View Slide

  5. Static vs Dynamic
    • Method calls typically have only a
    few possible targets.

    • Method tables are immutable

    • Object structure is fixed and
    reflects specific reference or
    primitive types

    • Compile phase performs many
    type checks before runtime

    • Method calls have unpredictable
    number of targets

    • Methods may be added, removed

    • Objects may change shape as
    new code is executed, like a
    glorified Map

    • Compile (really, parse) phase only
    verifies syntax

    View Slide

  6. JVM vs Non-JVM
    • Write once, run anywhere means
    JVM or nothing, no native libraries

    • Known limitations of JVM
    influence language design

    • Hard to compete with Java, since
    JVM was made for it

    • Userbase limited to a subset of
    the JVM community

    • Native access is the norm, lots of
    C libraries floating around

    • No real limit on wild and crazy
    language features

    • More competition with widely-
    used non-Java languages

    • Userbase limited only by
    language community

    View Slide

  7. JVM Non-JVM
    Dynamic
    Static
    Groovy
    Golo
    Scala
    Java
    Kotlin
    Ruby
    Clojure
    Erlang
    Python
    Javascript
    Rust
    C++
    C#
    Visual Basic
    Visual Basic.NET

    View Slide

  8. Why Bother?
    • Access to new communities and frameworks

    • Ruby, Python, JS, Erlang...all bring new ideas to the table

    • Rails is still a major force in web applications

    • Some problems fit dynamic languages well

    • Especially rapidly-evolving user-facing applications, like web

    • They can be educational...and a lot more fun

    View Slide

  9. View Slide

  10. JRuby Review
    • Ruby for the JVM

    • Two-way integration with Java, fitting into ecosystem

    • We are a Ruby implementation, but also a JVM language

    • Core classes largely written in Java

    • Parts of core and most of standard library in Ruby

    • Distribution like CRuby or as jars/wars, embedded into apps

    • No support for CRuby extensions, on purpose

    View Slide

  11. JRuby Challenges
    • Dynamic method calls, object shape (fields), constants

    • Local vars are determined at parse time, but mutable from lambdas

    • Frequent dependencies on C libraries

    • We wrap with a native access layer or use an equivalent JVM library

    • Many transient objects for numbers, small collections, local variable state

    • GC is not our problem...allocation is our problem!

    View Slide

  12. class Person
    # classes start out empty
    def initialize(name) # common name for all constructors
    @name = name # oops, guess we need room for @name variable in Person
    end
    def name=(new_name) # adding a setter method
    @name = new_name
    end
    attr_accessor :name # metaprogramming; defines setter and getter dynamically
    def upcase
    @name.upcase! # let's hope it's actually a String!
    end
    end
    Defining a Class and Methods

    View Slide

  13. protected volatile Map methods = Collections.EMPTY_MAP;
    protected Map cachedMethods = Collections.EMPTY_MAP;
    private volatile Map classVariables = Collections.EMPTY_MAP;
    private volatile Map constants = Collections.EMPTY_MAP;
    private volatile Map autoloads = Collections.EMPTY_MAP;
    # assigning a new, empty class to "Person" constant
    Person = Class.new {
    def initialize(name) ... end
    def name=(new_name) ... end
    attr_accessor :name
    def upcase ... end
    }
    Semantically Identical

    View Slide

  14. module Student # mix-in inheritance, similar to traits
    attr_accessor :grade
    attr_accessor :student_id
    end
    class Person # oh yes, we can reopen classes any time
    include Student
    end
    Enhance Person with Student

    View Slide

  15. people = [] # mutable array assigned to local var "people"
    5.times do |i| # alternate lambda/closure syntax
    person = Person.new("Charles#{i}") # new person with interpolated name
    person.grade = i
    people << person
    end
    people.map(&:upcase) # shortcut for map + upcase as lambda syntax
    Create People Collection

    View Slide

  16. people.each do |person|
    puts <<~end_string # "heredoc" or "raw" multiline string
    Name: #{person.name}
    Grade: #{person.grade}
    end_string
    end
    Iterate and Print

    View Slide

  17. • a: positional arg

    • b, c: destructuring args

    • d: optional arg

    • key_e: required keyword

    • key_f: optional keyword

    • rest: varargs collector

    • key_rest: keyword varargs
    collector
    def foo(a, (b, *c), d = 1, key_e:, key_f: 1, *rest, **key_rest)
    Many Forms of Arguments

    View Slide

  18. def match_it(regexp, string)
    regexp =~ string # implicit write of $~ "local" var
    Regexp.last_match # implicit read of $~ "local" var
    end
    max_grade = 0
    people.each {|person|
    max_grade = Math.max(max_grade, person.grade)
    }
    Cross-call Variables

    View Slide

  19. Megamorphic Lambdas
    people.each do |person|
    # modify all people
    end
    people.each do |person|
    # commit each person to database
    end
    people.each do |person|
    # print out all people
    end
    Like Java, hard to see through common "each" method,
    so most of these lambdas won't inline or optimize well.

    View Slide

  20. Building a dynamic language
    on JVM is still a challenge.

    View Slide

  21. We need better tools for codegen,

    optimizing and specializing calls, 

    dynamically shaping objects,
    and accessing local vars across calls

    View Slide

  22. Optimizing JRuby
    • JVM means JVM

    • Hotspot, J9, Android, Zing, even Java ME and IKVM

    • All versions since Java 1.4ish

    • Working largely within the bounds of JVM spec

    • Little modification across platforms, runtimes

    • Lots of experimentation to work around JVM limitations

    • Many upcoming experiments

    View Slide

  23. Multi-tier JIT
    • Going straight to bytecode is too expensive

    • Code runs first as IR in "simple" interpreter

    • Static optimizations, limited profile-driven optimizations

    • JIT transition is to either JVM bytecode or "optimized" interpreter

    • Upcoming: speculative optimization, deopt from bytecode to interpreter

    • Protoype profiling, inlining, deopt working but not finalized

    View Slide

  24. Lexical
    Analysis
    Parsing
    Semantic
    Analysis
    Optimization
    Bytecode
    Generation
    Interpret
    AST
    IR Instructions
    CFG DFG ...
    JRuby 1.7.x
    9000+
    ...

    View Slide

  25. IR Instructions
    0: check_arity(req: 1)
    1: %self := receive_self
    2: b := receive_arg(0)
    3: %v_3 := call(b, :==, fixnum<1>)
    4: b_false(LBL_0, %v_3)
    5: %v_4 := copy(“one”)
    6: %v_5 := call(%self, :puts, %v_4)
    7: return(%v_5)
    LBL_0
    8: %v_6 := copy(“!one”)
    9: %v_7 := call(%self, :puts, %v_6)
    10: return(%v_7)
    def foo(b)
    if b == 1
    puts "one"
    else
    puts "!one"
    end
    end

    View Slide

  26. Instructions
    Methods
    Flow

    View Slide

  27. Compiler Passes
    Added Code
    Removed Code

    View Slide

  28. Object Shaping
    • Translate dynamically-assigned instance variables to Java fields

    • Static analysis currently: inspect methods, speculate on size

    • Generate RubyObjectN subclass with real fields, fall back to IRubyObject[]

    • InvokeDynamic binds read/write as direct field access

    • Significant memory reduction, much less indirection

    • Upcoming: primitive support using field-doubling or fallback

    • Upcoming: allow a given class to produce multiple shapes

    View Slide

  29. Rails `select` Bench
    percent live alloc'ed class
    rank self accum bytes objs bytes objs name
    1 11.29% 11.29% 24145152 896789 105453936 3885626 org.jruby.runtime.builtin.IRubyObject[]
    23 0.82% 73.58% 1744576 18168 5894464 61396 org.jruby.gen.RubyObject17
    32 0.44% 78.33% 937784 23432 2071464 51774 org.jruby.gen.RubyObject2
    42 0.30% 81.96% 633312 19775 1525824 47666 org.jruby.gen.RubyObject0
    43 0.30% 82.26% 632168 11280 2783968 49705 org.jruby.gen.RubyObject6
    46 0.27% 83.10% 587072 18330 2133984 66671 org.jruby.gen.RubyObject1
    58 0.22% 86.08% 465056 3630 1672864 13066 org.jruby.gen.RubyObject25
    60 0.21% 86.51% 439304 10970 1493024 37313 org.jruby.gen.RubyObject3
    61 0.20% 86.71% 434608 9044 2311744 48151 org.jruby.gen.RubyObject5
    68 0.16% 87.93% 349936 7280 1305136 27180 org.jruby.gen.RubyObject4
    79 0.11% 89.34% 233824 3646 838432 13093 org.jruby.gen.RubyObject8
    238 0.01% 96.11% 28088 314 30816 345 org.jruby.gen.RubyObject14

    View Slide

  30. Array Specialization
    • Arrays used as both mutating "lists" and as immutable "vectors"

    • Hand-specialized 1- and 2-element implementations

    • Biggest impact is for small, transient arrays

    • Upcoming: Unify generation of shapes with instance variable logic

    • Upcoming: Primitive support via field-doubling or fallback

    • long[] as first pass with width-specific versions later

    View Slide

  31. Nearly Half are 1 or 2-element Arrays
    percent live alloc'ed class
    rank self accum bytes objs bytes objs name
    5 4.90% 33.79% 10481824 218361 38183968 795489 org.jruby.RubyArray
    11 3.11% 56.32% 6661072 138762 22817680 475358 org.jruby.specialized.RubyArrayOneObject
    17 1.46% 67.96% 3124112 55779 15838128 282815 org.jruby.specialized.RubyArrayTwoObject

    View Slide

  32. InvokeDynamic
    • Extensively used for all dynamic paths

    • Many different invocation types, including Ruby to Java

    • Dynamically binding instance variables

    • Constants actually act like constants

    • Startup time still suffers, but much less than in past

    • Upcoming: object shape guards, method cloning

    View Slide

  33. Method Call
    at DashE.RUBY$method$foo$0(-e:1)
    at java.lang.invoke.LambdaForm$DMH/168423058.invokeStatic_L7_L(LambdaForm$DMH)
    at java.lang.invoke.LambdaForm$BMH/648525677.reinvoke(LambdaForm$BMH)
    at java.lang.invoke.LambdaForm$MH/804564176.delegate(LambdaForm$MH)
    at java.lang.invoke.LambdaForm$MH/1897115967.guard(LambdaForm$MH)
    at java.lang.invoke.LambdaForm$MH/804564176.delegate(LambdaForm$MH)
    at java.lang.invoke.LambdaForm$MH/1897115967.guard(LambdaForm$MH)
    at java.lang.invoke.LambdaForm$MH/1805013491.linkToCallSite(LambdaForm$MH)
    at DashE.RUBY$script(-e:1)
    $ jruby -Xcompile.invokedynamic -e 'def foo; sleep; end; foo'

    View Slide

  34. Frame Elimination
    • Most implicit cross-call variables are known core methods

    • Only prepare frame space for what might be needed

    • Eliminate heap-based variable storage for closures

    • "Effectively final" similar to lambda

    • Upcoming: use deopt to lazily set up frame only when needed

    • Upcoming: explore StackWalker hacks to access vars directly

    View Slide

  35. Better JVMs and JITs
    • Starting to explore non-Hotspot "C2" runtimes

    • Eclipse OpenJ9, Azul Zing, Graal JIT, GraalVM

    • Mixed results so far, but we're working with those teams

    • Above all we want to be a "JVM language"

    • No dependence on a specific runtime to execute well

    View Slide

  36. Truffle

    View Slide

  37. What Makes Truffle Nice?
    • Only have to implement an AST (albeit a very rich AST)

    • AST is annotated and specialized by hand or generated

    • Trace-specific code specialization plus partial evaluation

    • Object shape specialization with DynamicObject

    • Communication of guards, inlining, deoptimization to Graal JIT

    • Integration, optimization, tooling with other Truffle languages

    View Slide

  38. TruffleRuby
    • Most of core implemented in Ruby

    • Targeted features as specialized AST nodes

    • Dependent on Graal, Truffle to boil it down

    • Nearly complete set of Ruby features

    • C extensions, binding-of-caller, optimized evals

    • Ongoing optimization work

    View Slide

  39. Why Not Truffle
    • Many users are still on Java 8, or on non-Hotspot JVMs

    • Truffle languages are unusably slow without Graal JIT

    • TruffleRuby not ready for production after five years

    • No supported, production-ready runtime today

    • Java integration may be more cumbersome

    View Slide

  40. View Slide

  41. What Can We Do?
    • Work within the bounds of JVM specification, JDK classes

    • Cooperate with JVM, JSR, JEP folks to fill in the gaps

    • Creatively use the capabilities we have at JVM level today

    • Focus on real-world users and their needs

    • Reconsider our options periodically

    View Slide

  42. Performance Status
    • Comparing small, medium, large examples

    • Numerics, data structures, Rails database access

    • JRuby (C2, Graal JIT, GraalVM) vs CRuby vs TruffleRuby (Native CE, EE)

    • CRuby 2.6 JIT excluded; does not appear to help these numbers

    View Slide

  43. Small Numeric Algorithms
    • Mandelbrot is the new fibonacci

    • Simple fractal generator, single-method, nearly all math ops

    • Worst case scenario for JRuby: so many boxes

    • CRuby uses tagged pointers

    • TruffleRuby specializes code and gets help from Graal JIT

    View Slide

  44. def mandelbrot(size)
    sum = 0
    byte_acc = 0
    bit_num = 0
    y = 0
    while y < size
    ci = (2.0*y/size)-1.0
    x = 0
    while x < size
    zrzr = zr = 0.0
    zizi = zi = 0.0
    cr = (2.0*x/size)-1.5
    escape = 0b1

    View Slide

  45. bench_mandelbrot total execution time (lower is better)
    0s
    1s
    2s
    3s
    4s
    CRuby 2.5 JRuby C2 JRuby Graal CE
    0.129s
    1.12s
    3.57s

    View Slide

  46. bench_mandelbrot total execution time (lower is better)
    0s
    0.033s
    0.065s
    0.098s
    0.13s
    JRuby Graal CE JRuby Graal EE TruffleRuby CE TruffleRuby EE
    0.123s
    0.111s
    0.118s
    0.129s

    View Slide

  47. Stupid Ruby Tricks
    • Occasionally users create throw-away arrays or hashes

    • `a > b ? b : a` vs `[a, b].sort[0]`

    • These tricks are not common...but ideally they should still be fast

    • Varargs also creates an Array

    • TruffleRuby's Array nodes give a big boost: 15-20x JRuby

    • More interested in C2 vs Graal

    View Slide

  48. def normal(a, b)
    a > b ? b : a
    end
    def array(a, b)
    [a,b].sort.at(0)
    end
    def varargs(*vals)
    vals.sort.at(0)
    end

    View Slide

  49. iterations per second (higher is better)
    0M ips
    150M ips
    300M ips
    450M ips
    600M ips
    JRuby C2 JRuby Graal CE JRuby Graal EE
    Normal Array Varargs

    View Slide

  50. 0M ips
    22.5M ips
    45M ips
    67.5M ips
    90M ips
    JRuby C2 JRuby Graal CE JRuby Graal EE
    Array Varargs

    View Slide

  51. Red/Black Tree
    • Larger, more practical demonstration

    • Construct, traverse, mutate, destroy

    • Object shaping plays a larger role

    • Typically a good case for JRuby, perf near CRuby ext version

    • Remaining work is on object specialization and frame elimination

    View Slide

  52. require 'benchmark'
    # Algorithm based on "Introduction to Algorithms" by Cormen and others
    class RedBlackTree
    class Node
    attr_accessor :color
    attr_accessor :key
    attr_accessor :left
    attr_accessor :right
    attr_accessor :parent
    RED = :red
    BLACK = :black
    COLORS = [RED, BLACK].freeze
    def initialize(key, color = RED)
    raise ArgumentError, "Bad value for color parameter" unless COLORS.include?(color)
    @color = color
    @key = key
    @left = @right = @parent = NilNode.instance
    end
    def black?
    return color == BLACK
    end
    def red?
    return color == RED
    end

    View Slide

  53. def insert(x)
    insert_helper(x)
    x.color = Node::RED
    while x != root && x.parent.color == Node::RED
    if x.parent == x.parent.parent.left
    y = x.parent.parent.right
    if !y.nil? && y.color == Node::RED
    x.parent.color = Node::BLACK
    y.color = Node::BLACK
    x.parent.parent.color = Node::RED
    x = x.parent.parent
    else
    if x == x.parent.right
    x = x.parent
    left_rotate(x)
    end
    x.parent.color = Node::BLACK
    x.parent.parent.color = Node::RED
    right_rotate(x.parent.parent)
    end
    else
    y = x.parent.parent.left
    if !y.nil? && y.color == Node::RED
    x.parent.color = Node::BLACK
    y.color = Node::BLACK
    x.parent.parent.color = Node::RED
    x = x.parent.parent
    else
    if x == x.parent.left

    View Slide

  54. def rbt_bm
    n = 100_000
    a1 = []; n.times { a1 << rand(999_999) }
    a2 = []; n.times { a2 << rand(999_999) }
    start = Time.now
    tree = RedBlackTree.new
    n.times {|i| tree.add(i) }
    n.times { tree.delete(tree.root) }
    tree = RedBlackTree.new
    a1.each {|e| tree.add(e) }
    a2.each {|e| tree.search(e) }
    tree.inorder_walk {|key| key + 1 }
    tree.reverse_inorder_walk {|key| key + 1 }
    n.times { tree.minimum }
    n.times { tree.maximum }
    return Time.now - start
    end
    N = (ARGV[0] || 20).to_i
    N.times do
    puts rbt_bm.to_f
    end

    View Slide

  55. bench_red_black total execution time (lower is better)
    0s
    0.35s
    0.7s
    1.05s
    1.4s
    CRuby 2.5 JRuby C2 JRuby Graal CE JRuby Graal EE TruffleRuby CE TruffleRuby EE
    0.142
    0.315s
    0.481s
    0.573s
    0.403s
    1.222s

    View Slide

  56. ActiveRecord
    • Rails's database access layer, ORM

    • Largest part of the "full stack" Rails experience

    • If you have a perf problem, it's probably related to ActiveRecord

    • Largely Ruby code, with native bindings to database drivers

    • Real-world example that heavily leverages Ruby dynamism

    • Using sqlite3 for simplicity

    View Slide

  57. ActiveRecord Selects
    time for 1000 selects, lower is better
    0
    0.075
    0.15
    0.225
    0.3
    CRuby 2.5 JRuby C2 JRuby Graal CE JRuby Graal EE
    binary boolean date datetime decimal float integer
    string text time timestamp *

    View Slide

  58. Warmup Curve
    0
    1
    2
    3
    4
    JRuby C2 JRuby Graal CE JRuby Graal EE

    View Slide

  59. Warmup vs TruffleRuby
    0
    5
    10
    15
    20
    JRuby C2 TruffleRuby CE TruffleRuby EE

    View Slide

  60. Performance Notes
    • Current techniques in JRuby work quite well

    • InvokeDynamic inlining, optimizing Ruby + Java code together

    • Specialized objects helping to reduce memory, allocation overhead

    • JVM JITs can do a lot more for us!

    • Graal JIT showing great promise

    • Looking forward to other JVM JITs improving escape analysis

    • Performance even on Java 8 beats CRuby

    View Slide

  61. Thank You!
    • Charles Oliver Nutter

    [email protected]

    • @headius

    • https://github.com/jruby/jruby

    View Slide