Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Tale of Two String Representations

303aae3354beb438eaa44000b1f2f3fd?s=47 Kevin Menard
September 08, 2016

A Tale of Two String Representations

Strings are used pervasively in Ruby. If we can make them faster, we can make many apps faster.

In this talk, I will be introducing ropes: an immutable tree-based data structure for implementing strings. While an old idea, ropes provide a new way of looking at string performance and mutability in Ruby. I will describe how we replaced a byte array-oriented string representation with a rope-based one in JRuby+Truffle. Then we’ll look at how moving to ropes affects common string operations, its immediate performance impact, and how ropes can have cascading performance implications for apps.

303aae3354beb438eaa44000b1f2f3fd?s=128

Kevin Menard

September 08, 2016
Tweet

Transcript

  1. None
  2. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | A Tale of Two String Representations Kevin Menard Principal Member of Technical Staff Oracle Labs September 08, 2016
  3. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | Safe Harbor Statement The following is intended to provide some insight into a line of research in Oracle Labs. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. Oracle reserves the right to alter its development plans and practices at any time, and the development, release, and timing of any features or functionality described in connection with any Oracle product or service remains at the sole discretion of Oracle. Any views expressed in this presentation are my own and do not necessarily reflect the views of Oracle.
  4. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | A Tale of Two String Representations
  5. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | 'Hello'
  6. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | Representations
  7. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | 'Hello' byteLength 5 UTF-8 7BIT bytes encoding codeRange } 'H' } 'e' } 'l' } 'l' } 'o' 0x48 0x65 0x6C 0x6C 0x6F … …
  8. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | RStrings • Mutable • Flat representation – Requires contiguous memory • Byte-oriented • Shares memory via copy-on-write of byte array
  9. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | ' Ruby' 'Hello' Concat 'Kaigi' Concat Substring
  10. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | Ropes • Immutable • Tree representation – Does not require contiguous memory • Logical string fragment oriented • Shares memory by building new trees with existing nodes
  11. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | JRuby+Truffle Ropes
  12. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | Operations
  13. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | Concatenation (RString) 0x77 0x6F 0x72 0x6C 0x64 0x20 0x48 0x65 0x6C 0x6C 0x6F x y x = 'Hello' y = ' world' z = x + y
  14. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | Concatenation (RString) 0x48 0x65 0x6C 0x6C 0x6F 0x77 0x6F 0x72 0x6C 0x64 0x20 ??? ??? ??? ??? ??? ??? ??? ??? ??? ??? ??? x y z x = 'Hello' y = ' world' z = x + y
  15. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | Concatenation (RString) 0x48 0x65 0x6C 0x6C 0x6F 0x77 0x6F 0x72 0x6C 0x64 0x20 0x48 0x65 0x6C 0x6C 0x6F ??? ??? ??? ??? ??? ??? x y z copy x = 'Hello' y = ' world' z = x + y
  16. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | Concatenation (RString) 0x48 0x65 0x6C 0x6C 0x6F 0x77 0x6F 0x72 0x6C 0x64 0x20 0x48 0x65 0x6C 0x6C 0x6F 0x77 0x6F 0x72 0x6C 0x64 0x20 x y z copy x = 'Hello' y = ' world' z = x + y
  17. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | Concatenation (RString) • O(n) operation • 1 memory allocation (z.size = x.size + y.size) • 2 memory copy operations • 2 copies of x and y in memory
  18. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | Concatenation (Rope) y AsciiOnlyLeafRope ' world' x AsciiOnlyLeafRope 'Hello' x = 'Hello' y = ' world' z = x + y
  19. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | Concatenation (Rope) y AsciiOnlyLeafRope ' world' x AsciiOnlyLeafRope 'Hello' ConcatRope z x = 'Hello' y = ' world' z = x + y
  20. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | Concatenation (Rope) • O(1) operation • 1 Node allocation • 0 memory copy operations • 1 copy of x and y in memory
  21. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | Concatenation (+) Performance 1.0 1.2 0.9 83.6 0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0 Concat (+) Speed-up Relative to MRI MRI 2.3.1 JRuby 9.1.2.0 JRuby+Truffle (GraalVM 0.16) - RString JRuby+Truffle (GraalVM 0.16) - Ropes
  22. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | Append (<<) Performance 1.0 2.0 3.1 2.3 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 Append (<<) Speed-up Relative to MRI MRI 2.3.1 JRuby 9.1.2.0 JRuby+Truffle (GraalVM 0.16) - RString JRuby+Truffle (GraalVM 0.16) - Ropes
  23. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | Concatenation (+) vs Append (<<) Relative Performance 1.0 1.0 1.0 1.0 39.6 65.5 138.5 1.1 0.0 20.0 40.0 60.0 80.0 100.0 120.0 140.0 160.0 MRI 2.3.1 JRuby 9.1.2.0 JRuby+Truffle (GraalVM 0.16) - RString JRuby+Truffle (GraalVM 0.16) - Ropes String Concatenation Performance Relative to Implementation (Lower is Better) Concat (+) Append (<<)
  24. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | Concatenation Methods • + • <</concat • []= • insert • gsub/gsub! • sub/sub!
  25. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | Substring (RString) x = 'Hello' y = x[1..3] 0x48 0x65 0x6C 0x6C 0x6F x
  26. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | Substring (RString) 0x48 0x65 0x6C 0x6C 0x6F x ??? ??? ??? y x = 'Hello' y = x[1..3]
  27. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | Substring (RString) 0x48 0x65 0x6C 0x6C 0x6F x 0x65 0x6C 0x6C y copy x = 'Hello' y = x[1..3]
  28. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | Substring (Rope) x AsciiOnlyLeafRope 'Hello' x = 'Hello' y = x[1..3]
  29. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | Substring (Rope) x AsciiOnlyLeafRope 'Hello' y SubstringRope offset: 1; byteLen: 3 x = 'Hello' y = x[1..3]
  30. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | • [] • byteslice • chomp/chomp! • chop/chop! • chr • clear • each_char • lstrip/lstrip! • partition • rpartition • rstrip/rstrip! • scan • split • Regexp matches Substring Methods
  31. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | Multiplication (RString) x = 'abc' y = x * 3 0x61 0x62 0x63 x
  32. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | Multiplication (RString) ??? ??? ??? ??? ??? ??? ??? ??? ??? y x = 'abc' y = x * 3 0x61 0x62 0x63 x
  33. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | Multiplication (RString) 0x61 0x62 0x63 x 0x61 0x62 0x63 ??? ??? ??? ??? ??? ??? y copy x = 'abc' y = x * 3
  34. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | Multiplication (RString) 0x61 0x62 0x63 x 0x61 0x62 0x63 0x61 0x62 0x63 ??? ??? ??? y copy x = 'abc' y = x * 3
  35. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | Multiplication (RString) 0x61 0x62 0x63 x 0x61 0x62 0x63 0x61 0x62 0x63 0x61 0x62 0x63 y copy x = 'abc' y = x * 3
  36. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | Multiplication (Rope) x = 'abc' y = x * 3 x AsciiOnlyLeafRope 'abc'
  37. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | Multiplication (Rope) x AsciiOnlyLeafRope 'abc' y RepeatingRope times: 3 x = 'abc' y = x * 3
  38. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | 'Hello'
  39. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | 'Hello'.bytesize => 5
  40. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | 'Hello'.size => 5
  41. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | String Length Performance (ASCII Characters) 1.0 1.0 1.9 1.7 17.4 17.1 17.1 17.0 0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0 16.0 18.0 20.0 "Hello".size ("Hello" * 100).size Speed-up Relative to MRI MRI 2.3.1 JRuby 9.1.2.0 JRuby+Truffle (GraalVM 0.16) - RString JRuby+Truffle (GraalVM 0.16) - Ropes
  42. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | 'こにちわ'
  43. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | 'こにちわ'.bytesize => 12
  44. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | 'こにちわ'.size => 4
  45. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | String Length Performance (Variable Width Characters) 1.0 1.0 1.9 0.9 2.3 0.1 21.6 71.5 0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 "こにちわ".size ("こにちわ" * 100).size Speed-up Relative to MRI MRI 2.3.1 JRuby 9.1.2.0 JRuby+Truffle (GraalVM 0.16) - RString JRuby+Truffle (GraalVM 0.16) - Ropes
  46. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | String Length Relative Performance 1.0 1.0 1.0 4.2 7.6 1.0 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 MRI 2.3.1 JRuby 9.1.2.0 JRuby+Truffle (GraalVM 0.16) - Ropes String Length Performance Relative to Implementation (Lower is Better) ("こにちわ" * 100).size ("Hello" * 100).size
  47. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | • [] • []= • center • chr • each_char • each_codepoint • each_line • index/rindex • insert • length/size • ljust/rjust • match • partition/rpartition • scrub • upto • etc. Methods Using String Length
  48. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | Wrapping Up
  49. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | Other Rope Benefits • Allow for interning of strings – Reduced memory consumption • Metaprogramming – Faster name comparisons (reference equality vs byte-wise comparison) • Final values are great for Graal (partial evaluation) • Implicitly thread-safe
  50. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | Deficiencies • Ruby String have two purposes: – Sequence of characters (what we’ve looked at thus far) – Byte buffer (what ropes are not very good at) • Node costs dominate smaller string fragments • While some ops are faster, others are slower – E.g., String#[] is logarithmic for balanced tree, degrades to linear • Less production experience – Lack of familiarity == lack of understanding?
  51. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | References • Ropes – Ropes: an Alternative to Strings – Boehm et al. (1995) – http://www.spinute.org/ruby/gsoc2016/english.html • RString – Ruby Under a Microscope – Pat Shaugnessy – http://patshaughnessy.net/2012/1/4/never-create-ruby-strings-longer-than-23- characters – http://patshaughnessy.net/2012/1/18/seeing-double-how-ruby-shares-string-values • Benchmarks – https://github.com/nirvdrum/bench9000/tree/rubykaigi_2016
  52. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | Recap • Ropes are an alternative way to represent Ruby strings – Interesting bridge from mutable to effectively immutable – Working out quite well with JRuby+Truffle & Graal • I think it’ll work out well for others • We need to make core operations fast and compact • Change approach to get out of local maxima
  53. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | Acknowledgements Benoit Daloze Petr Chalupa Brandon Fish Kevin Menard Chris Seaton Jruby & Rubinius Contributors Oracle Danilo Ansaloni Stefan Anzinger Cosmin Basca Daniele Bonetta Matthias Brantner Petr Chalupa Jürgen Christ Laurent Daynès Gilles Duboscq Martin Entlicher Bastian Hossbach Christian Humer Mick Jordan Vojin Jovanovic Peter Kessler Oracle (continued) David Leopoldseder Kevin Menard Jakub Podlešák Aleksandar Prokopec Tom Rodriguez Roland Schatz Chris Seaton Doug Simon Štěpán Šindelář Zbyněk Šlajchrt Lukas Stadler Codrut Stancu Jan Štola Jaroslav Tulach Michael Van De Vanter Adam Welc Christian Wimmer Christian Wirth Paul Wögerer Mario Wolczko Andreas Wöß Thomas Würthinger JKU Linz Prof. Hanspeter Mössenböck Benoit Daloze Josef Eisl Thomas Feichtinger Matthias Grimmer Christian Häubl Josef Haider Christian Huber Stefan Marr Manuel Rigger Stefan Rumzucker Bernhard Urban University of Edinburgh Christophe Dubach Juan José Fumero Alfonso Ranjeet Singh Toomas Remmelg LaBRI Floréal Morandat University of California, Irvine Prof. Michael Franz Gulfem Savrun Yeniceri Wei Zhang Purdue University Prof. Jan Vitek Tomas Kalibera Petr Maj Lei Zhao T. U. Dortmund Prof. Peter Marwedel Helena Kotthaus Ingo Korb University of California, Davis Prof. Duncan Temple Lang Nicholas Ulle University of Lugano, Switzerland Prof. Walter Binder Sun Haiyang Yudi Zheng Oracle Interns Brian Belleville Miguel Garcia Shams Imam Alexey Karyakin Stephen Kell Andreas Kunft Volker Lanting Gero Leinemann Julian Lettner Joe Nash David Piorkowski Gregor Richards Robert Seilbeck Rifat Shariyar Alumni Erik Eckstein Michael Haupt Christos Kotselidis Hyunjin Lee David Leibs Chris Thalinger Till Westmann
  54. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | kevin.j.menard@oracle.com @nirvdrum https://github.com/jruby/jruby/wiki/Truffle (or just search for ‘jruby truffle’)
  55. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    |
  56. None