Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Charles Nutter — Dynamic Languages on the JVM: ...

Moscow JUG
October 04, 2018

Charles Nutter — Dynamic Languages on the JVM: Are We There Yet?

It’s been over a decade since JRuby first compiled Ruby code to the JVM.

Has the dream of a dynamic-language JVM arrived yet? We’ll review techniques used by JRuby and other dynamic languages and see how well various JVM technologies are working to make those languages fast and efficient.

Moscow JUG

October 04, 2018
Tweet

More Decks by Moscow JUG

Other Decks in Programming

Transcript

  1. Intro • Charles Oliver Nutter • @headius • [email protected]

    JRuby co-lead since 2006 • Split time between dev and community work • Red Hat "research and prototyping" group
  2. What's In This Talk • Dynamic language challenges on the

    JVM • JRuby as case study for optimizing dynlangs • Current and future solutions • A few interesting benchmarks • The future of dynamic languages on JVM
  3. Static vs Dynamic • Method calls typically have only a

    few possible targets. • Method tables are immutable • Object structure is fixed and reflects specific reference or primitive types • Compile phase performs many type checks before runtime • Method calls have unpredictable number of targets • Methods may be added, removed • Objects may change shape as new code is executed, like a glorified Map • Compile (really, parse) phase only verifies syntax
  4. JVM vs Non-JVM • Write once, run anywhere means JVM

    or nothing, no native libraries • Known limitations of JVM influence language design • Hard to compete with Java, since JVM was made for it • Userbase limited to a subset of the JVM community • Native access is the norm, lots of C libraries floating around • No real limit on wild and crazy language features • More competition with widely- used non-Java languages • Userbase limited only by language community
  5. JVM Non-JVM Dynamic Static Groovy Golo Scala Java Kotlin Ruby

    Clojure Erlang Python Javascript Rust C++ C# Visual Basic Visual Basic.NET
  6. Why Bother? • Access to new communities and frameworks •

    Ruby, Python, JS, Erlang...all bring new ideas to the table • Rails is still a major force in web applications • Some problems fit dynamic languages well • Especially rapidly-evolving user-facing applications, like web • They can be educational...and a lot more fun
  7. JRuby Review • Ruby for the JVM • Two-way integration

    with Java, fitting into ecosystem • We are a Ruby implementation, but also a JVM language • Core classes largely written in Java • Parts of core and most of standard library in Ruby • Distribution like CRuby or as jars/wars, embedded into apps • No support for CRuby extensions, on purpose
  8. JRuby Challenges • Dynamic method calls, object shape (fields), constants

    • Local vars are determined at parse time, but mutable from lambdas • Frequent dependencies on C libraries • We wrap with a native access layer or use an equivalent JVM library • Many transient objects for numbers, small collections, local variable state • GC is not our problem...allocation is our problem!
  9. class Person # classes start out empty def initialize(name) #

    common name for all constructors @name = name # oops, guess we need room for @name variable in Person end def name=(new_name) # adding a setter method @name = new_name end attr_accessor :name # metaprogramming; defines setter and getter dynamically def upcase @name.upcase! # let's hope it's actually a String! end end Defining a Class and Methods
  10. protected volatile Map<String, DynamicMethod> methods = Collections.EMPTY_MAP; protected Map<String, CacheEntry>

    cachedMethods = Collections.EMPTY_MAP; private volatile Map<String, IRubyObject> classVariables = Collections.EMPTY_MAP; private volatile Map<String, ConstantEntry> constants = Collections.EMPTY_MAP; private volatile Map<String, Autoload> autoloads = Collections.EMPTY_MAP; # assigning a new, empty class to "Person" constant Person = Class.new { def initialize(name) ... end def name=(new_name) ... end attr_accessor :name def upcase ... end } Semantically Identical
  11. module Student # mix-in inheritance, similar to traits attr_accessor :grade

    attr_accessor :student_id end class Person # oh yes, we can reopen classes any time include Student end Enhance Person with Student
  12. people = [] # mutable array assigned to local var

    "people" 5.times do |i| # alternate lambda/closure syntax person = Person.new("Charles#{i}") # new person with interpolated name person.grade = i people << person end people.map(&:upcase) # shortcut for map + upcase as lambda syntax Create People Collection
  13. people.each do |person| puts <<~end_string # "heredoc" or "raw" multiline

    string Name: #{person.name} Grade: #{person.grade} end_string end Iterate and Print
  14. • a: positional arg • b, c: destructuring args •

    d: optional arg • key_e: required keyword • key_f: optional keyword • rest: varargs collector • key_rest: keyword varargs collector def foo(a, (b, *c), d = 1, key_e:, key_f: 1, *rest, **key_rest) Many Forms of Arguments
  15. def match_it(regexp, string) regexp =~ string # implicit write of

    $~ "local" var Regexp.last_match # implicit read of $~ "local" var end max_grade = 0 people.each {|person| max_grade = Math.max(max_grade, person.grade) } Cross-call Variables
  16. Megamorphic Lambdas people.each do |person| # modify all people end

    people.each do |person| # commit each person to database end people.each do |person| # print out all people end Like Java, hard to see through common "each" method, so most of these lambdas won't inline or optimize well.
  17. We need better tools for codegen,
 optimizing and specializing calls,

    
 dynamically shaping objects, and accessing local vars across calls
  18. Optimizing JRuby • JVM means JVM • Hotspot, J9, Android,

    Zing, even Java ME and IKVM • All versions since Java 1.4ish • Working largely within the bounds of JVM spec • Little modification across platforms, runtimes • Lots of experimentation to work around JVM limitations • Many upcoming experiments
  19. Multi-tier JIT • Going straight to bytecode is too expensive

    • Code runs first as IR in "simple" interpreter • Static optimizations, limited profile-driven optimizations • JIT transition is to either JVM bytecode or "optimized" interpreter • Upcoming: speculative optimization, deopt from bytecode to interpreter • Protoype profiling, inlining, deopt working but not finalized
  20. IR Instructions 0: check_arity(req: 1) 1: %self := receive_self 2:

    b := receive_arg(0) 3: %v_3 := call(b, :==, fixnum<1>) 4: b_false(LBL_0, %v_3) 5: %v_4 := copy(“one”) 6: %v_5 := call(%self, :puts, %v_4) 7: return(%v_5) LBL_0 8: %v_6 := copy(“!one”) 9: %v_7 := call(%self, :puts, %v_6) 10: return(%v_7) def foo(b) if b == 1 puts "one" else puts "!one" end end
  21. Object Shaping • Translate dynamically-assigned instance variables to Java fields

    • Static analysis currently: inspect methods, speculate on size • Generate RubyObjectN subclass with real fields, fall back to IRubyObject[] • InvokeDynamic binds read/write as direct field access • Significant memory reduction, much less indirection • Upcoming: primitive support using field-doubling or fallback • Upcoming: allow a given class to produce multiple shapes
  22. Rails `select` Bench percent live alloc'ed class rank self accum

    bytes objs bytes objs name 1 11.29% 11.29% 24145152 896789 105453936 3885626 org.jruby.runtime.builtin.IRubyObject[] 23 0.82% 73.58% 1744576 18168 5894464 61396 org.jruby.gen.RubyObject17 32 0.44% 78.33% 937784 23432 2071464 51774 org.jruby.gen.RubyObject2 42 0.30% 81.96% 633312 19775 1525824 47666 org.jruby.gen.RubyObject0 43 0.30% 82.26% 632168 11280 2783968 49705 org.jruby.gen.RubyObject6 46 0.27% 83.10% 587072 18330 2133984 66671 org.jruby.gen.RubyObject1 58 0.22% 86.08% 465056 3630 1672864 13066 org.jruby.gen.RubyObject25 60 0.21% 86.51% 439304 10970 1493024 37313 org.jruby.gen.RubyObject3 61 0.20% 86.71% 434608 9044 2311744 48151 org.jruby.gen.RubyObject5 68 0.16% 87.93% 349936 7280 1305136 27180 org.jruby.gen.RubyObject4 79 0.11% 89.34% 233824 3646 838432 13093 org.jruby.gen.RubyObject8 238 0.01% 96.11% 28088 314 30816 345 org.jruby.gen.RubyObject14
  23. Array Specialization • Arrays used as both mutating "lists" and

    as immutable "vectors" • Hand-specialized 1- and 2-element implementations • Biggest impact is for small, transient arrays • Upcoming: Unify generation of shapes with instance variable logic • Upcoming: Primitive support via field-doubling or fallback • long[] as first pass with width-specific versions later
  24. Nearly Half are 1 or 2-element Arrays percent live alloc'ed

    class rank self accum bytes objs bytes objs name 5 4.90% 33.79% 10481824 218361 38183968 795489 org.jruby.RubyArray 11 3.11% 56.32% 6661072 138762 22817680 475358 org.jruby.specialized.RubyArrayOneObject 17 1.46% 67.96% 3124112 55779 15838128 282815 org.jruby.specialized.RubyArrayTwoObject
  25. InvokeDynamic • Extensively used for all dynamic paths • Many

    different invocation types, including Ruby to Java • Dynamically binding instance variables • Constants actually act like constants • Startup time still suffers, but much less than in past • Upcoming: object shape guards, method cloning
  26. Method Call at DashE.RUBY$method$foo$0(-e:1) at java.lang.invoke.LambdaForm$DMH/168423058.invokeStatic_L7_L(LambdaForm$DMH) at java.lang.invoke.LambdaForm$BMH/648525677.reinvoke(LambdaForm$BMH) at java.lang.invoke.LambdaForm$MH/804564176.delegate(LambdaForm$MH)

    at java.lang.invoke.LambdaForm$MH/1897115967.guard(LambdaForm$MH) at java.lang.invoke.LambdaForm$MH/804564176.delegate(LambdaForm$MH) at java.lang.invoke.LambdaForm$MH/1897115967.guard(LambdaForm$MH) at java.lang.invoke.LambdaForm$MH/1805013491.linkToCallSite(LambdaForm$MH) at DashE.RUBY$script(-e:1) $ jruby -Xcompile.invokedynamic -e 'def foo; sleep; end; foo'
  27. Frame Elimination • Most implicit cross-call variables are known core

    methods • Only prepare frame space for what might be needed • Eliminate heap-based variable storage for closures • "Effectively final" similar to lambda • Upcoming: use deopt to lazily set up frame only when needed • Upcoming: explore StackWalker hacks to access vars directly
  28. Better JVMs and JITs • Starting to explore non-Hotspot "C2"

    runtimes • Eclipse OpenJ9, Azul Zing, Graal JIT, GraalVM • Mixed results so far, but we're working with those teams • Above all we want to be a "JVM language" • No dependence on a specific runtime to execute well
  29. What Makes Truffle Nice? • Only have to implement an

    AST (albeit a very rich AST) • AST is annotated and specialized by hand or generated • Trace-specific code specialization plus partial evaluation • Object shape specialization with DynamicObject • Communication of guards, inlining, deoptimization to Graal JIT • Integration, optimization, tooling with other Truffle languages
  30. TruffleRuby • Most of core implemented in Ruby • Targeted

    features as specialized AST nodes • Dependent on Graal, Truffle to boil it down • Nearly complete set of Ruby features • C extensions, binding-of-caller, optimized evals • Ongoing optimization work
  31. Why Not Truffle • Many users are still on Java

    8, or on non-Hotspot JVMs • Truffle languages are unusably slow without Graal JIT • TruffleRuby not ready for production after five years • No supported, production-ready runtime today • Java integration may be more cumbersome
  32. What Can We Do? • Work within the bounds of

    JVM specification, JDK classes • Cooperate with JVM, JSR, JEP folks to fill in the gaps • Creatively use the capabilities we have at JVM level today • Focus on real-world users and their needs • Reconsider our options periodically
  33. Performance Status • Comparing small, medium, large examples • Numerics,

    data structures, Rails database access • JRuby (C2, Graal JIT, GraalVM) vs CRuby vs TruffleRuby (Native CE, EE) • CRuby 2.6 JIT excluded; does not appear to help these numbers
  34. Small Numeric Algorithms • Mandelbrot is the new fibonacci •

    Simple fractal generator, single-method, nearly all math ops • Worst case scenario for JRuby: so many boxes • CRuby uses tagged pointers • TruffleRuby specializes code and gets help from Graal JIT
  35. def mandelbrot(size) sum = 0 byte_acc = 0 bit_num =

    0 y = 0 while y < size ci = (2.0*y/size)-1.0 x = 0 while x < size zrzr = zr = 0.0 zizi = zi = 0.0 cr = (2.0*x/size)-1.5 escape = 0b1
  36. bench_mandelbrot total execution time (lower is better) 0s 1s 2s

    3s 4s CRuby 2.5 JRuby C2 JRuby Graal CE 0.129s 1.12s 3.57s
  37. bench_mandelbrot total execution time (lower is better) 0s 0.033s 0.065s

    0.098s 0.13s JRuby Graal CE JRuby Graal EE TruffleRuby CE TruffleRuby EE 0.123s 0.111s 0.118s 0.129s
  38. Stupid Ruby Tricks • Occasionally users create throw-away arrays or

    hashes • `a > b ? b : a` vs `[a, b].sort[0]` • These tricks are not common...but ideally they should still be fast • Varargs also creates an Array • TruffleRuby's Array nodes give a big boost: 15-20x JRuby • More interested in C2 vs Graal
  39. def normal(a, b) a > b ? b : a

    end def array(a, b) [a,b].sort.at(0) end def varargs(*vals) vals.sort.at(0) end
  40. iterations per second (higher is better) 0M ips 150M ips

    300M ips 450M ips 600M ips JRuby C2 JRuby Graal CE JRuby Graal EE Normal Array Varargs
  41. 0M ips 22.5M ips 45M ips 67.5M ips 90M ips

    JRuby C2 JRuby Graal CE JRuby Graal EE Array Varargs
  42. Red/Black Tree • Larger, more practical demonstration • Construct, traverse,

    mutate, destroy • Object shaping plays a larger role • Typically a good case for JRuby, perf near CRuby ext version • Remaining work is on object specialization and frame elimination
  43. require 'benchmark' # Algorithm based on "Introduction to Algorithms" by

    Cormen and others class RedBlackTree class Node attr_accessor :color attr_accessor :key attr_accessor :left attr_accessor :right attr_accessor :parent RED = :red BLACK = :black COLORS = [RED, BLACK].freeze def initialize(key, color = RED) raise ArgumentError, "Bad value for color parameter" unless COLORS.include?(color) @color = color @key = key @left = @right = @parent = NilNode.instance end def black? return color == BLACK end def red? return color == RED end
  44. def insert(x) insert_helper(x) x.color = Node::RED while x != root

    && x.parent.color == Node::RED if x.parent == x.parent.parent.left y = x.parent.parent.right if !y.nil? && y.color == Node::RED x.parent.color = Node::BLACK y.color = Node::BLACK x.parent.parent.color = Node::RED x = x.parent.parent else if x == x.parent.right x = x.parent left_rotate(x) end x.parent.color = Node::BLACK x.parent.parent.color = Node::RED right_rotate(x.parent.parent) end else y = x.parent.parent.left if !y.nil? && y.color == Node::RED x.parent.color = Node::BLACK y.color = Node::BLACK x.parent.parent.color = Node::RED x = x.parent.parent else if x == x.parent.left
  45. def rbt_bm n = 100_000 a1 = []; n.times {

    a1 << rand(999_999) } a2 = []; n.times { a2 << rand(999_999) } start = Time.now tree = RedBlackTree.new n.times {|i| tree.add(i) } n.times { tree.delete(tree.root) } tree = RedBlackTree.new a1.each {|e| tree.add(e) } a2.each {|e| tree.search(e) } tree.inorder_walk {|key| key + 1 } tree.reverse_inorder_walk {|key| key + 1 } n.times { tree.minimum } n.times { tree.maximum } return Time.now - start end N = (ARGV[0] || 20).to_i N.times do puts rbt_bm.to_f end
  46. bench_red_black total execution time (lower is better) 0s 0.35s 0.7s

    1.05s 1.4s CRuby 2.5 JRuby C2 JRuby Graal CE JRuby Graal EE TruffleRuby CE TruffleRuby EE 0.142 0.315s 0.481s 0.573s 0.403s 1.222s
  47. ActiveRecord • Rails's database access layer, ORM • Largest part

    of the "full stack" Rails experience • If you have a perf problem, it's probably related to ActiveRecord • Largely Ruby code, with native bindings to database drivers • Real-world example that heavily leverages Ruby dynamism • Using sqlite3 for simplicity
  48. ActiveRecord Selects time for 1000 selects, lower is better 0

    0.075 0.15 0.225 0.3 CRuby 2.5 JRuby C2 JRuby Graal CE JRuby Graal EE binary boolean date datetime decimal float integer string text time timestamp *
  49. Warmup Curve 0 1 2 3 4 JRuby C2 JRuby

    Graal CE JRuby Graal EE
  50. Warmup vs TruffleRuby 0 5 10 15 20 JRuby C2

    TruffleRuby CE TruffleRuby EE
  51. Performance Notes • Current techniques in JRuby work quite well

    • InvokeDynamic inlining, optimizing Ruby + Java code together • Specialized objects helping to reduce memory, allocation overhead • JVM JITs can do a lot more for us! • Graal JIT showing great promise • Looking forward to other JVM JITs improving escape analysis • Performance even on Java 8 beats CRuby