Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | A Tale of Two String Representations Kevin Menard Principal Member of Technical Staff Oracle Labs September 08, 2016

Slide 3

Slide 3 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Safe Harbor Statement The following is intended to provide some insight into a line of research in Oracle Labs. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. Oracle reserves the right to alter its development plans and practices at any time, and the development, release, and timing of any features or functionality described in connection with any Oracle product or service remains at the sole discretion of Oracle. Any views expressed in this presentation are my own and do not necessarily reflect the views of Oracle.

Slide 4

Slide 4 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | A Tale of Two String Representations

Slide 5

Slide 5 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 'Hello'

Slide 6

Slide 6 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Representations

Slide 7

Slide 7 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 'Hello' byteLength 5 UTF-8 7BIT bytes encoding codeRange } 'H' } 'e' } 'l' } 'l' } 'o' 0x48 0x65 0x6C 0x6C 0x6F … …

Slide 8

Slide 8 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | RStrings • Mutable • Flat representation – Requires contiguous memory • Byte-oriented • Shares memory via copy-on-write of byte array

Slide 9

Slide 9 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | ' Ruby' 'Hello' Concat 'Kaigi' Concat Substring

Slide 10

Slide 10 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Ropes • Immutable • Tree representation – Does not require contiguous memory • Logical string fragment oriented • Shares memory by building new trees with existing nodes

Slide 11

Slide 11 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | JRuby+Truffle Ropes

Slide 12

Slide 12 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Operations

Slide 13

Slide 13 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Concatenation (RString) 0x77 0x6F 0x72 0x6C 0x64 0x20 0x48 0x65 0x6C 0x6C 0x6F x y x = 'Hello' y = ' world' z = x + y

Slide 14

Slide 14 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Concatenation (RString) 0x48 0x65 0x6C 0x6C 0x6F 0x77 0x6F 0x72 0x6C 0x64 0x20 ??? ??? ??? ??? ??? ??? ??? ??? ??? ??? ??? x y z x = 'Hello' y = ' world' z = x + y

Slide 15

Slide 15 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Concatenation (RString) 0x48 0x65 0x6C 0x6C 0x6F 0x77 0x6F 0x72 0x6C 0x64 0x20 0x48 0x65 0x6C 0x6C 0x6F ??? ??? ??? ??? ??? ??? x y z copy x = 'Hello' y = ' world' z = x + y

Slide 16

Slide 16 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Concatenation (RString) 0x48 0x65 0x6C 0x6C 0x6F 0x77 0x6F 0x72 0x6C 0x64 0x20 0x48 0x65 0x6C 0x6C 0x6F 0x77 0x6F 0x72 0x6C 0x64 0x20 x y z copy x = 'Hello' y = ' world' z = x + y

Slide 17

Slide 17 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Concatenation (RString) • O(n) operation • 1 memory allocation (z.size = x.size + y.size) • 2 memory copy operations • 2 copies of x and y in memory

Slide 18

Slide 18 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Concatenation (Rope) y AsciiOnlyLeafRope ' world' x AsciiOnlyLeafRope 'Hello' x = 'Hello' y = ' world' z = x + y

Slide 19

Slide 19 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Concatenation (Rope) y AsciiOnlyLeafRope ' world' x AsciiOnlyLeafRope 'Hello' ConcatRope z x = 'Hello' y = ' world' z = x + y

Slide 20

Slide 20 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Concatenation (Rope) • O(1) operation • 1 Node allocation • 0 memory copy operations • 1 copy of x and y in memory

Slide 21

Slide 21 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Concatenation (+) Performance 1.0 1.2 0.9 83.6 0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0 Concat (+) Speed-up Relative to MRI MRI 2.3.1 JRuby 9.1.2.0 JRuby+Truffle (GraalVM 0.16) - RString JRuby+Truffle (GraalVM 0.16) - Ropes

Slide 22

Slide 22 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Append (<<) Performance 1.0 2.0 3.1 2.3 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 Append (<<) Speed-up Relative to MRI MRI 2.3.1 JRuby 9.1.2.0 JRuby+Truffle (GraalVM 0.16) - RString JRuby+Truffle (GraalVM 0.16) - Ropes

Slide 23

Slide 23 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Concatenation (+) vs Append (<<) Relative Performance 1.0 1.0 1.0 1.0 39.6 65.5 138.5 1.1 0.0 20.0 40.0 60.0 80.0 100.0 120.0 140.0 160.0 MRI 2.3.1 JRuby 9.1.2.0 JRuby+Truffle (GraalVM 0.16) - RString JRuby+Truffle (GraalVM 0.16) - Ropes String Concatenation Performance Relative to Implementation (Lower is Better) Concat (+) Append (<<)

Slide 24

Slide 24 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Concatenation Methods • + • <

Slide 25

Slide 25 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Substring (RString) x = 'Hello' y = x[1..3] 0x48 0x65 0x6C 0x6C 0x6F x

Slide 26

Slide 26 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Substring (RString) 0x48 0x65 0x6C 0x6C 0x6F x ??? ??? ??? y x = 'Hello' y = x[1..3]

Slide 27

Slide 27 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Substring (RString) 0x48 0x65 0x6C 0x6C 0x6F x 0x65 0x6C 0x6C y copy x = 'Hello' y = x[1..3]

Slide 28

Slide 28 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Substring (Rope) x AsciiOnlyLeafRope 'Hello' x = 'Hello' y = x[1..3]

Slide 29

Slide 29 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Substring (Rope) x AsciiOnlyLeafRope 'Hello' y SubstringRope offset: 1; byteLen: 3 x = 'Hello' y = x[1..3]

Slide 30

Slide 30 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | • [] • byteslice • chomp/chomp! • chop/chop! • chr • clear • each_char • lstrip/lstrip! • partition • rpartition • rstrip/rstrip! • scan • split • Regexp matches Substring Methods

Slide 31

Slide 31 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Multiplication (RString) x = 'abc' y = x * 3 0x61 0x62 0x63 x

Slide 32

Slide 32 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Multiplication (RString) ??? ??? ??? ??? ??? ??? ??? ??? ??? y x = 'abc' y = x * 3 0x61 0x62 0x63 x

Slide 33

Slide 33 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Multiplication (RString) 0x61 0x62 0x63 x 0x61 0x62 0x63 ??? ??? ??? ??? ??? ??? y copy x = 'abc' y = x * 3

Slide 34

Slide 34 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Multiplication (RString) 0x61 0x62 0x63 x 0x61 0x62 0x63 0x61 0x62 0x63 ??? ??? ??? y copy x = 'abc' y = x * 3

Slide 35

Slide 35 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Multiplication (RString) 0x61 0x62 0x63 x 0x61 0x62 0x63 0x61 0x62 0x63 0x61 0x62 0x63 y copy x = 'abc' y = x * 3

Slide 36

Slide 36 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Multiplication (Rope) x = 'abc' y = x * 3 x AsciiOnlyLeafRope 'abc'

Slide 37

Slide 37 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Multiplication (Rope) x AsciiOnlyLeafRope 'abc' y RepeatingRope times: 3 x = 'abc' y = x * 3

Slide 38

Slide 38 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 'Hello'

Slide 39

Slide 39 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 'Hello'.bytesize => 5

Slide 40

Slide 40 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 'Hello'.size => 5

Slide 41

Slide 41 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | String Length Performance (ASCII Characters) 1.0 1.0 1.9 1.7 17.4 17.1 17.1 17.0 0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0 16.0 18.0 20.0 "Hello".size ("Hello" * 100).size Speed-up Relative to MRI MRI 2.3.1 JRuby 9.1.2.0 JRuby+Truffle (GraalVM 0.16) - RString JRuby+Truffle (GraalVM 0.16) - Ropes

Slide 42

Slide 42 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 'こにちわ'

Slide 43

Slide 43 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 'こにちわ'.bytesize => 12

Slide 44

Slide 44 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 'こにちわ'.size => 4

Slide 45

Slide 45 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | String Length Performance (Variable Width Characters) 1.0 1.0 1.9 0.9 2.3 0.1 21.6 71.5 0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 "こにちわ".size ("こにちわ" * 100).size Speed-up Relative to MRI MRI 2.3.1 JRuby 9.1.2.0 JRuby+Truffle (GraalVM 0.16) - RString JRuby+Truffle (GraalVM 0.16) - Ropes

Slide 46

Slide 46 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | String Length Relative Performance 1.0 1.0 1.0 4.2 7.6 1.0 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 MRI 2.3.1 JRuby 9.1.2.0 JRuby+Truffle (GraalVM 0.16) - Ropes String Length Performance Relative to Implementation (Lower is Better) ("こにちわ" * 100).size ("Hello" * 100).size

Slide 47

Slide 47 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | • [] • []= • center • chr • each_char • each_codepoint • each_line • index/rindex • insert • length/size • ljust/rjust • match • partition/rpartition • scrub • upto • etc. Methods Using String Length

Slide 48

Slide 48 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Wrapping Up

Slide 49

Slide 49 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Other Rope Benefits • Allow for interning of strings – Reduced memory consumption • Metaprogramming – Faster name comparisons (reference equality vs byte-wise comparison) • Final values are great for Graal (partial evaluation) • Implicitly thread-safe

Slide 50

Slide 50 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Deficiencies • Ruby String have two purposes: – Sequence of characters (what we’ve looked at thus far) – Byte buffer (what ropes are not very good at) • Node costs dominate smaller string fragments • While some ops are faster, others are slower – E.g., String#[] is logarithmic for balanced tree, degrades to linear • Less production experience – Lack of familiarity == lack of understanding?

Slide 51

Slide 51 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | References • Ropes – Ropes: an Alternative to Strings – Boehm et al. (1995) – http://www.spinute.org/ruby/gsoc2016/english.html • RString – Ruby Under a Microscope – Pat Shaugnessy – http://patshaughnessy.net/2012/1/4/never-create-ruby-strings-longer-than-23- characters – http://patshaughnessy.net/2012/1/18/seeing-double-how-ruby-shares-string-values • Benchmarks – https://github.com/nirvdrum/bench9000/tree/rubykaigi_2016

Slide 52

Slide 52 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Recap • Ropes are an alternative way to represent Ruby strings – Interesting bridge from mutable to effectively immutable – Working out quite well with JRuby+Truffle & Graal • I think it’ll work out well for others • We need to make core operations fast and compact • Change approach to get out of local maxima

Slide 53

Slide 53 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Acknowledgements Benoit Daloze Petr Chalupa Brandon Fish Kevin Menard Chris Seaton Jruby & Rubinius Contributors Oracle Danilo Ansaloni Stefan Anzinger Cosmin Basca Daniele Bonetta Matthias Brantner Petr Chalupa Jürgen Christ Laurent Daynès Gilles Duboscq Martin Entlicher Bastian Hossbach Christian Humer Mick Jordan Vojin Jovanovic Peter Kessler Oracle (continued) David Leopoldseder Kevin Menard Jakub Podlešák Aleksandar Prokopec Tom Rodriguez Roland Schatz Chris Seaton Doug Simon Štěpán Šindelář Zbyněk Šlajchrt Lukas Stadler Codrut Stancu Jan Štola Jaroslav Tulach Michael Van De Vanter Adam Welc Christian Wimmer Christian Wirth Paul Wögerer Mario Wolczko Andreas Wöß Thomas Würthinger JKU Linz Prof. Hanspeter Mössenböck Benoit Daloze Josef Eisl Thomas Feichtinger Matthias Grimmer Christian Häubl Josef Haider Christian Huber Stefan Marr Manuel Rigger Stefan Rumzucker Bernhard Urban University of Edinburgh Christophe Dubach Juan José Fumero Alfonso Ranjeet Singh Toomas Remmelg LaBRI Floréal Morandat University of California, Irvine Prof. Michael Franz Gulfem Savrun Yeniceri Wei Zhang Purdue University Prof. Jan Vitek Tomas Kalibera Petr Maj Lei Zhao T. U. Dortmund Prof. Peter Marwedel Helena Kotthaus Ingo Korb University of California, Davis Prof. Duncan Temple Lang Nicholas Ulle University of Lugano, Switzerland Prof. Walter Binder Sun Haiyang Yudi Zheng Oracle Interns Brian Belleville Miguel Garcia Shams Imam Alexey Karyakin Stephen Kell Andreas Kunft Volker Lanting Gero Leinemann Julian Lettner Joe Nash David Piorkowski Gregor Richards Robert Seilbeck Rifat Shariyar Alumni Erik Eckstein Michael Haupt Christos Kotselidis Hyunjin Lee David Leibs Chris Thalinger Till Westmann

Slide 54

Slide 54 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | [email protected] @nirvdrum https://github.com/jruby/jruby/wiki/Truffle (or just search for ‘jruby truffle’)

Slide 55

Slide 55 text

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Slide 56

Slide 56 text

No content