Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Metaprogramming, Metaobject Protocols, Gradual Type Checks: Optimizing the "Unoptimizable" Using Old Ideas

B207c84229c3cc91fa26369bc374d2eb?s=47 Stefan Marr
October 20, 2019

Metaprogramming, Metaobject Protocols, Gradual Type Checks: Optimizing the "Unoptimizable" Using Old Ideas

Metaobject Protocols and Type Checks, do they have much in common? Perhaps not from a language perspective. However, under the hood of a modern virtual machine, they turn out to show very similar behavior and can be optimized very similarly.

This talk will go back to the days of Terminator 2, The Naked Gun 2 1/2, and Star Trek VI. We will revisit the early days of just-in-time compilation, the basic insights that are still true, and see how to apply them to metaprogramming techniques of different shapes and forms.

B207c84229c3cc91fa26369bc374d2eb?s=128

Stefan Marr

October 20, 2019
Tweet

Transcript

  1. Metaprogramming, Metaobject Protocols, Gradual Type Checks Optimizing the “Unoptimizable” Using

    Old Ideas Stefan Marr Athens, October 2019 Creative Commons Attribution-ShareAlike 4.0 License
  2. Got a Question? Feel free to interrupt me! 2

  3. The “Unoptimizable”

  4. The Knights on the Quest for Excellent Performance 4 Excellent

    Performance Reflection Metaprogramming Gradual Typing Metaobject Protocols
  5. Metaprogramming Reflection, Introspection, Intercession obj.invoke("methodName", [arg1]) obj.getField("name") obj.setField("name", value) obj.getDefinedFields()

    … 5 Reflection Metaprogramming
  6. Metaprogramming 6 meth.invoke() 1.7x slower Dynamic Proxies 7.5x slower Reflection

    Metaprogramming Excellent Performance Don’t Use It Inn
  7. Metaobject Protocols LoggingClass extends Metaclass { writeToField(obj, fieldName, value) {

    console.log(`${fieldName}: ${value}`) obj.setField(fieldName, value) } } 7 Metaobject Protocols Excellent Performance
  8. Metaobject Protocols LoggingClass extends Metaclass { writeToField(obj, fieldName, value) {

    console.log(`${fieldName}: ${value}`) obj.setField(fieldName, value) } } 8 Metaobject Protocols Excellent Performance AOP
  9. Gradual Typing async addMessage(user: User, message) { const msg =

    `<img src="/image/${user.profilePicture}"> ${message}</span>`; this.outputElem .insertAdjacentHTML( 'beforeend', msg); 9 Excellent Performance Gradual Typing
  10. Gradual Typing async addMessage(user: User, message) { const msg =

    `<img src="/image/${user.profilePicture}"> ${message}</span>`; this.outputElem .insertAdjacentHTML( 'beforeend', msg); 10 Excellent Performance Gradual Typing Somewhat True… The whole Truth is a little more complex
  11. The Knights Found New Homes 11 Excellent Performance Reflection Metaprogramming

    Gradual Typing Metaobject Protocols AOP Land of Engineering Short Cuts
  12. The story could have ended here… 12

  13. If it wouldn’t be for the 90’s 13

  14. An “everything has been done in Lisp” Talk 14 Smalltalk

    Self
  15. The (Movie) Heroes of ‘91 15 Polymorphic Inline Caches Just-in-time

    Compilation Maps (Hidden Classes) Terminator 2 The Naked Gun 2 1/2 Star Trek VI
  16. Key Papers* 16 *Lots of necessary work afterwards, but lay

    foundations
  17. POLYMORPHIC INLINE CACHES (PICS) A technique for lookup caching. 17

  18. A Class Hierarchy of Widgets 18 class Widget { fitsInto(width)

    { return this.width <= width; } } class Button extends Widget {} class RadioButton extends Button {} fn findAllThatFit(arr, width) { const result = []; for (const w of arr) if (w.fitsInto(width)) result.append(w) return result; }
  19. Lookups can be frequent and costly 19 class Widget {

    fitsInto(width) { return this.width <= width; } } class Button extends Widget {} class RadioButton extends Button {} fn findAllThatFit(arr, width) { const result = []; for (const w of arr) if (w.fitsInto(width)) result.append(w) return result; } RadioButton Button fitsInto Widget superclass superclass For each fitsInto call hasMethod: 3x getSuperclass: 2x
  20. Solution: Lookup Caching 20 w.fitsInto(width) could be various functions, but

    we don’t need to do the same lookup repeatedly method method (in case we see widget of different class) PIC: check for receiver and jump to method directly in machine code Useful because: • Most sends are monomorphic • Few are polymorphic • And just a couple are megamorphic
  21. The Terminator PIC Eliminates Potential Variability 22

  22. JUST IN TIME COMPILATION Generating Machine Code at Run Time

    23
  23. Just-in-time Compilation • Produces native code, optimized, avoiding the overhead

    of interpretation • At run time, can utilize knowledge about program execution • Ahead-of-time compilation, i.e., classic static compilation can only guess how a program is used 24 Missing Slide added after the talk
  24. With PICs, we can know 25 class Widget { fitsInto(width)

    { return this.width <= width; } } class Button extends Widget {} class RadioButton extends Button {} fn findAllThatFit(arr, width) { const result = []; for (const w of arr) if (w.fitsInto(width)) result.append(w) return result; } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 RadioButton Widget.fitsInto Array RadioButton Integer
  25. And Generate Efficient Code 26 fn findAllThatFit(arr, width) { const

    result = []; for (const w of arr) if (w.width(field) <=(int) width) result.append(w) return result; } 1 2 3 4 5 6 7 inlined fitsInto Specialized to types Resulting in Efficient Machine Code
  26. Polymorphic Inline Caches • Give us type information • Enable

    method inlining 27
  27. Our Hero has a lot of Luck and Achieves his

    Goal 28 Relies on Developers not utilizing flexibility And, understanding 90’s humor
  28. MAPS, HIDDEN CLASSES, OBJECT SHAPES Structural Information for Objects 29

  29. The Power of Dynamic Languages 30 o = {foo: 33}

    Object with 1 field o.bar = new Object() Object with 2 fields o.float = 4.2 Object with 3 fields o.float = "string" And you can store anything
  30. Data Representation for Objects 31 o = {foo: 33} o.bar

    = new Object() o.baz = "string" o.float = 4.2 Obj 1 2 3 … 8 foo 33 "string" 4.2 bar baz float Full Power of Dynamic Languages: Rarely Used
  31. Maps, Hidden Classes, Object Shapes 32 o = {foo: 33}

    o.bar = new Object() o.baz = "string" o.float = 4.2 Obj 33 4.2 "string" Shape 1: foo(int) 2: baz(ptr) 3: float(float) 4: bar(ptr) Combined with inline caches, a field access is a simple access at memory offset
  32. Our Hero brings Structure and Logic to the Chaos 33

  33. METAPROGRAMMING Is there hope for our Knight from the Land

    of Engineering Shortcuts to find the true treasure? 34 Reflection Metaprogramming Excellent Performance
  34. Reflective Method Invocation 35 cnt.invoke("+", [1]) Let’s look at the

    addition first
  35. Reflective Method Invocation 36 cnt.invoke("+", [1]) Int.+ method cnt.+(1)

  36. Optimizing Reflective Method Invocation 37 cnt.invoke("+", [1]) + string Int.+

    method Nesting of Lookup Caches Eliminates Potential Variability * string Int.* method
  37. Zero-Overhead Metaprogramming: Reflection and Metaobject Protocols Fast and without Compromises.

    Marr, S., Seaton, C. & Ducasse, S. (2015). PLDI’15 Simple Metaprogramming: Zero Overhead 38 http://stefan-marr.de/papers/pldi-marr-et-al-zero-overhead-metaprogramming-artifacts/ Reflective & Direct: Identical Machine Code!
  38. Ruby Image Processing Code using Metaprogramming 39 • • •

    • • 10.0 12.5 15.0 17.5 20.0 Compose Color Burn Compose Color Dodge Compose Darken Compose Difference Compose Exclusion Compose Hard Light Compose Hard Mix Compose Lighten Compose Linear Burn Compose Linear Dodge Compose Linear Light Compose Multiply Compose Normal Compose Overlay Compose Pin Light Compose Screen Compose Soft Light Compose Vivid Light Speedup over unoptimized (higher is better) Speedup over unoptimized (higher is better)
  39. METAOBJECT PROTOCOLS Is there hope for our Knight from the

    Land of Engineering Shortcuts to find the true treasure? 40 Excellent Performance Metaobject Protocols
  40. Metaobject Protocols WriteLogging extends Metaclass { writeToField(obj, fieldName, value) {

    console.log(`${fieldName}: ${value}`) obj.setField(fieldName, value) } } 41 Redefine the Language from within the Language
  41. Problem obj.field = 12; writeToField(obj, "field", 12) fn writeToField(obj, fieldName,

    value) { console.log(`${fieldName}: ${value}`) obj.setField(fieldName, value) } } 42 turns into AOP Looks very Hard!
  42. Ownership-based Metaobject Protocol Building a Safe Actor Framework class ActorDomain

    extends Domain { fn writeToField(obj, fieldIdx, value) { if (Domain.current() == this) { obj.setField(fieldIdx, value); } else { throw new IsolationError(obj); } } /* ... */ } 43 http://stefan-marr.de/research/omop/
  43. An Actor Example 44 actor.fieldA := 1 semantic depends on

    metaobject AD.writeToField Cache Desired Language Semantics Eliminates Potential Variability Std write
  44. OMOP Overhead 45 meta-tracing partial evaluation Overhead: 4% (min. -1%,

    max. 19%) Overhead: 9% (min. -7%, max. 38%) • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 1.00 1.05 1.10 1.15 1.20 Bounce BubbleSort Dispatch Fannkuch Fibonacci FieldLoop IntegerLoop List Loop Permute QuickSort Recurse Storage Sum Towers TreeSort WhileLoop DeltaBlue Mandelbrot NBody Richards Runtime Ratio to run without OMOP SOMMT • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 0.8 1.0 1.2 1.4 Bounce BubbleSort Dispatch Fannkuch Fibonacci FieldLoop IntegerLoop List Loop Permute QuickSort Recurse Storage Sum Towers TreeSort WhileLoop DeltaBlue Mandelbrot NBody Richards Runtime Ratio to run without OMOP SOMPE
  45. Our Heroes 46 Eliminates Potential Variability Makes it Fast, With

    a little Luck
  46. GRADUAL TYPING Is there hope for our Knight from the

    Land of Engineering Shortcuts to find the true treasure? 47 Excellent Performance Gradual Typing
  47. Gradual Typing without Run-Time Semantics async addMessage(user: User, message) {

    const msg = `<img src="/image/${user.profilePicture}"> ${message}</span>`; this.outputElem .insertAdjacentHTML( 'beforeend', msg); 48 Very Useful in Practice. And rather popular. Gradual Typing
  48. Transient Gradual Typing type Vehicle = interface { registration registerTo(_)

    } type Department = interface { code } var companyCar: Vehicle := object { method registerTo(d: Department) { print "{d.code}" } } companyCar.registerTo( object { var name := "R&D" }) 49 Types are shallow. Method names matter, but arguments don’t. Object only has name, no code method Assignment to registerTo(d: Department) should error
  49. Transient Gradual Typing tmp := object { method registerTo(d) {

    typeCheck d is Department print "{d.code}" } } typeCheck tmp is Vehicle var companyCar = tmp companyCar.registerTo( object { var name := "R&D" }) 50 Very simple semantics. Other Gradual systems have blame, and are more complex Possibly many many checks. Looks very Hard! Gradual Typing
  50. How to get rid of these checks without losing run-time

    semantics ? tmp := object { method registerTo(d) { typeCheck d is Department print "{d.code}" } } typeCheck tmp is Vehicle var companyCar = tmp companyCar.registerTo( object { var name := "R&D" }) 51
  51. Shapes to the Rescue 52 Shape 1: foo(int) 2: baz(ptr)

    3: float(float) 4: bar(ptr) Shape 1: foo(int) 2: baz(ptr) 3: float(float) 4: bar(ptr) Implicitly Compatible to: - Type 1 - Type 2 1. Check object is compatible 2. Shape implies compatibility
  52. Final optimized code tmp := object { method registerTo(d) {

    check d hasShape s1 print "{d.code}" } } check tmp hasShape s2 var companyCar = tmp companyCar.registerTo( object { var name := "R&D" }) 53 need to do type check only once per lexical location s1.code s2.registerTo JIT Compiler can remove redundant checks
  53. Works Well! 54 OOPSLA’17 ECOOP’19

  54. Transient Typechecks Are (Almost) Free 55

  55. Our Heroes 56 Eliminates Potential Variability Provides a Supporting Structure

    Makes it Fast With a little Luck Polymorphic Inline Caches Just-in-time Compilation Maps, Hidden Classes, Shapes
  56. WARP UP 57

  57. Things I didn’t talk about Failure cases: Deoptimization An Efficient

    Implementation of SELF a Dynamically-Typed Object- Oriented Language Based on Prototypes. Chambers, C., Ungar, D. & Lee, E. (1989). OOPSLA’89 Object shapes are useful for other things Efficient and Thread-Safe Objects for Dynamically- Typed Languages. B. Daloze, S. Marr, D. Bonetta, and H. Mössenböck. OOPSLA'16 58 And many other modern optimizations
  58. Our Knights Made it With Some Help of our 90’s

    Heroes 59 Excellent Performance Reflection Metaprogramming Gradual Typing Metaobject Protocols
  59. Key Papers* 60 *Lots of necessary work afterwards, but lay

    foundations
  60. Research and Literature • Efficient Implementation of the Smalltalk-80 System.

    Deutsch, L. P. & Schiffman, A. M. (1984). POPL’84 • Optimizing Dynamically-Typed Object-Oriented Languages With Polymorphic Inline Caches. Hölzle, U., Chambers, C. & Ungar, D. (1991). ECOOP’91 • Zero-Overhead Metaprogramming: Reflection and Metaobject Protocols Fast and without Compromises. Marr, S., Seaton, C. & Ducasse, S. (2015). PLDI’15 • Optimizing prototypes in V8 https://mathiasbynens.be/notes/ prototypes • https://mathiasbynens.be/notes/ shapes-ics • https://mrale.ph/blog/2012/06/0 3/explaining-js-vms-in-js-inline- caches.html 61
  61. Research and Literature • An Efficient Implementation of SELF a

    Dynamically-Typed Object-Oriented Language Based on Prototypes. Chambers, C., Ungar, D. & Lee, E. (1989). OOPSLA’89 • An Object Storage Model for the Truffle Language Implementation Framework. A. Wöß, C. Wirth, D. Bonetta, C. Seaton, C. Humer, and H. Mössenböck. PPPJ’14. • Storage Strategies for Collections in Dynamically Typed Languages. C. F. Bolz, L. Diekmann, and L. Tratt. OOPSLA’13. • Memento Mori: Dynamic Allocation-site-based Optimizations. Clifford, D., Payer, H., Stanton, M. & Titzer, B. L. (2015). ISMM’15 62
  62. Research and Literature • Virtual Machine Warmup Blows Hot and

    Cold. Barrett, E., Bolz-Tereick, C. F., Killick, R., Mount, S. & Tratt, L. (2017). OOPSLA’17 • Quantifying Performance Changes with Effect Size Confidence Intervals. Kalibera, T. & Jones, R. (2012). Technical Report, University of Kent. • Rigorous Benchmarking in Reasonable Time. Kalibera, T. & Jones, R. (2013). ISMM’13 • How Not to Lie With Statistics: The Correct Way to Summarize Benchmark Results. Fleming, P. J. & Wallace, J. J. (1986). Commun. ACM • SIGPLAN Empirical Evaluation Guidelines https://www.sigplan.org/Resources/E mpiricalEvaluation/ • Systems Benchmarking Crimes, Gernot Heiser https://www.cse.unsw.edu.au/~gern ot/benchmarking-crimes.html • Benchmarking Crimes: An Emerging Threat in Systems Security. van der Kouwe, E., Andriesse, D., Bos, H., Giuffrida, C. & Heiser, G. (2018). arxiv:1801.02381 • http://btorpey.github.io/blog/2014/0 2/18/clock-sources-in-linux/ • Generating an Artefact From a Benchmarking Setup as Part of CI https://stefan- marr.de/2019/05/artifacts-from-ci/ 63