Metaprogramming, Metaobject Protocols, Gradual Type Checks: Optimizing the "Unoptimizable" Using Old Ideas

Slide 1

Slide 1 text

Metaprogramming, Metaobject Protocols, Gradual Type Checks Optimizing the “Unoptimizable” Using Old Ideas Stefan Marr Athens, October 2019 Creative Commons Attribution-ShareAlike 4.0 License

Slide 2

Slide 2 text

Got a Question? Feel free to interrupt me! 2

Slide 3

Slide 3 text

The “Unoptimizable”

Slide 4

Slide 4 text

The Knights on the Quest for Excellent Performance 4 Excellent Performance Reflection Metaprogramming Gradual Typing Metaobject Protocols

Slide 5

Slide 5 text

Metaprogramming Reflection, Introspection, Intercession obj.invoke("methodName", [arg1]) obj.getField("name") obj.setField("name", value) obj.getDefinedFields() … 5 Reflection Metaprogramming

Slide 6

Slide 6 text

Metaprogramming 6 meth.invoke() 1.7x slower Dynamic Proxies 7.5x slower Reflection Metaprogramming Excellent Performance Don’t Use It Inn

Slide 7

Slide 7 text

Metaobject Protocols LoggingClass extends Metaclass { writeToField(obj, fieldName, value) { console.log(`${fieldName}: ${value}`) obj.setField(fieldName, value) } } 7 Metaobject Protocols Excellent Performance

Slide 8

Slide 8 text

Metaobject Protocols LoggingClass extends Metaclass { writeToField(obj, fieldName, value) { console.log(`${fieldName}: ${value}`) obj.setField(fieldName, value) } } 8 Metaobject Protocols Excellent Performance AOP

Slide 9

Slide 9 text

Gradual Typing async addMessage(user: User, message) { const msg = `

${message}`; this.outputElem .insertAdjacentHTML( 'beforeend', msg); 9 Excellent Performance Gradual Typing

Slide 10

Slide 10 text

Gradual Typing async addMessage(user: User, message) { const msg = `

${message}`; this.outputElem .insertAdjacentHTML( 'beforeend', msg); 10 Excellent Performance Gradual Typing Somewhat True… The whole Truth is a little more complex

Slide 11

Slide 11 text

The Knights Found New Homes 11 Excellent Performance Reflection Metaprogramming Gradual Typing Metaobject Protocols AOP Land of Engineering Short Cuts

Slide 12

Slide 12 text

The story could have ended here… 12

Slide 13

Slide 13 text

If it wouldn’t be for the 90’s 13

Slide 14

Slide 14 text

An “everything has been done in Lisp” Talk 14 Smalltalk Self

Slide 15

Slide 15 text

The (Movie) Heroes of ‘91 15 Polymorphic Inline Caches Just-in-time Compilation Maps (Hidden Classes) Terminator 2 The Naked Gun 2 1/2 Star Trek VI

Slide 16

Slide 16 text

Key Papers* 16 *Lots of necessary work afterwards, but lay foundations

Slide 17

Slide 17 text

POLYMORPHIC INLINE CACHES (PICS) A technique for lookup caching. 17

Slide 18

Slide 18 text

A Class Hierarchy of Widgets 18 class Widget { fitsInto(width) { return this.width <= width; } } class Button extends Widget {} class RadioButton extends Button {} fn findAllThatFit(arr, width) { const result = []; for (const w of arr) if (w.fitsInto(width)) result.append(w) return result; }

Slide 19

Slide 19 text

Lookups can be frequent and costly 19 class Widget { fitsInto(width) { return this.width <= width; } } class Button extends Widget {} class RadioButton extends Button {} fn findAllThatFit(arr, width) { const result = []; for (const w of arr) if (w.fitsInto(width)) result.append(w) return result; } RadioButton Button fitsInto Widget superclass superclass For each fitsInto call hasMethod: 3x getSuperclass: 2x

Slide 20

Slide 20 text

Solution: Lookup Caching 20 w.fitsInto(width) could be various functions, but we don’t need to do the same lookup repeatedly method method (in case we see widget of different class) PIC: check for receiver and jump to method directly in machine code Useful because: • Most sends are monomorphic • Few are polymorphic • And just a couple are megamorphic

Slide 21

Slide 21 text

The Terminator PIC Eliminates Potential Variability 22

Slide 22

Slide 22 text

JUST IN TIME COMPILATION Generating Machine Code at Run Time 23

Slide 23

Slide 23 text

Just-in-time Compilation • Produces native code, optimized, avoiding the overhead of interpretation • At run time, can utilize knowledge about program execution • Ahead-of-time compilation, i.e., classic static compilation can only guess how a program is used 24 Missing Slide added after the talk

Slide 24

Slide 24 text

With PICs, we can know 25 class Widget { fitsInto(width) { return this.width <= width; } } class Button extends Widget {} class RadioButton extends Button {} fn findAllThatFit(arr, width) { const result = []; for (const w of arr) if (w.fitsInto(width)) result.append(w) return result; } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 RadioButton Widget.fitsInto Array RadioButton Integer

Slide 25

Slide 25 text

And Generate Efficient Code 26 fn findAllThatFit(arr, width) { const result = []; for (const w of arr) if (w.width(field) <=(int) width) result.append(w) return result; } 1 2 3 4 5 6 7 inlined fitsInto Specialized to types Resulting in Efficient Machine Code

Slide 26

Slide 26 text

Polymorphic Inline Caches • Give us type information • Enable method inlining 27

Slide 27

Slide 27 text

Our Hero has a lot of Luck and Achieves his Goal 28 Relies on Developers not utilizing flexibility And, understanding 90’s humor

Slide 28

Slide 28 text

MAPS, HIDDEN CLASSES, OBJECT SHAPES Structural Information for Objects 29

Slide 29

Slide 29 text

The Power of Dynamic Languages 30 o = {foo: 33} Object with 1 field o.bar = new Object() Object with 2 fields o.float = 4.2 Object with 3 fields o.float = "string" And you can store anything

Slide 30

Slide 30 text

Data Representation for Objects 31 o = {foo: 33} o.bar = new Object() o.baz = "string" o.float = 4.2 Obj 1 2 3 … 8 foo 33 "string" 4.2 bar baz float Full Power of Dynamic Languages: Rarely Used

Slide 31

Slide 31 text

Maps, Hidden Classes, Object Shapes 32 o = {foo: 33} o.bar = new Object() o.baz = "string" o.float = 4.2 Obj 33 4.2 "string" Shape 1: foo(int) 2: baz(ptr) 3: float(float) 4: bar(ptr) Combined with inline caches, a field access is a simple access at memory offset

Slide 32

Slide 32 text

Our Hero brings Structure and Logic to the Chaos 33

Slide 33

Slide 33 text

METAPROGRAMMING Is there hope for our Knight from the Land of Engineering Shortcuts to find the true treasure? 34 Reflection Metaprogramming Excellent Performance

Slide 34

Slide 34 text

Reflective Method Invocation 35 cnt.invoke("+", [1]) Let’s look at the addition first

Slide 35

Slide 35 text

Reflective Method Invocation 36 cnt.invoke("+", [1]) Int.+ method cnt.+(1)

Slide 36

Slide 36 text

Optimizing Reflective Method Invocation 37 cnt.invoke("+", [1]) + string Int.+ method Nesting of Lookup Caches Eliminates Potential Variability * string Int.* method

Slide 37

Slide 37 text

Zero-Overhead Metaprogramming: Reflection and Metaobject Protocols Fast and without Compromises. Marr, S., Seaton, C. & Ducasse, S. (2015). PLDI’15 Simple Metaprogramming: Zero Overhead 38 http://stefan-marr.de/papers/pldi-marr-et-al-zero-overhead-metaprogramming-artifacts/ Reflective & Direct: Identical Machine Code!

Slide 38

Slide 38 text

Ruby Image Processing Code using Metaprogramming 39 ● ● ● ● ● 10.0 12.5 15.0 17.5 20.0 Compose Color Burn Compose Color Dodge Compose Darken Compose Difference Compose Exclusion Compose Hard Light Compose Hard Mix Compose Lighten Compose Linear Burn Compose Linear Dodge Compose Linear Light Compose Multiply Compose Normal Compose Overlay Compose Pin Light Compose Screen Compose Soft Light Compose Vivid Light Speedup over unoptimized (higher is better) Speedup over unoptimized (higher is better)

Slide 39

Slide 39 text

METAOBJECT PROTOCOLS Is there hope for our Knight from the Land of Engineering Shortcuts to find the true treasure? 40 Excellent Performance Metaobject Protocols

Slide 40

Slide 40 text

Metaobject Protocols WriteLogging extends Metaclass { writeToField(obj, fieldName, value) { console.log(`${fieldName}: ${value}`) obj.setField(fieldName, value) } } 41 Redefine the Language from within the Language

Slide 41

Slide 41 text

Problem obj.field = 12; writeToField(obj, "field", 12) fn writeToField(obj, fieldName, value) { console.log(`${fieldName}: ${value}`) obj.setField(fieldName, value) } } 42 turns into AOP Looks very Hard!

Slide 42

Slide 42 text

Ownership-based Metaobject Protocol Building a Safe Actor Framework class ActorDomain extends Domain { fn writeToField(obj, fieldIdx, value) { if (Domain.current() == this) { obj.setField(fieldIdx, value); } else { throw new IsolationError(obj); } } /* ... */ } 43 http://stefan-marr.de/research/omop/

Slide 43

Slide 43 text

An Actor Example 44 actor.fieldA := 1 semantic depends on metaobject AD.writeToField Cache Desired Language Semantics Eliminates Potential Variability Std write

Slide 44

Slide 44 text

OMOP Overhead 45 meta-tracing partial evaluation Overhead: 4% (min. -1%, max. 19%) Overhead: 9% (min. -7%, max. 38%) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1.00 1.05 1.10 1.15 1.20 Bounce BubbleSort Dispatch Fannkuch Fibonacci FieldLoop IntegerLoop List Loop Permute QuickSort Recurse Storage Sum Towers TreeSort WhileLoop DeltaBlue Mandelbrot NBody Richards Runtime Ratio to run without OMOP SOMMT ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 1.0 1.2 1.4 Bounce BubbleSort Dispatch Fannkuch Fibonacci FieldLoop IntegerLoop List Loop Permute QuickSort Recurse Storage Sum Towers TreeSort WhileLoop DeltaBlue Mandelbrot NBody Richards Runtime Ratio to run without OMOP SOMPE

Slide 45

Slide 45 text

Our Heroes 46 Eliminates Potential Variability Makes it Fast, With a little Luck

Slide 46

Slide 46 text

GRADUAL TYPING Is there hope for our Knight from the Land of Engineering Shortcuts to find the true treasure? 47 Excellent Performance Gradual Typing

Slide 47

Slide 47 text

Gradual Typing without Run-Time Semantics async addMessage(user: User, message) { const msg = `

${message}`; this.outputElem .insertAdjacentHTML( 'beforeend', msg); 48 Very Useful in Practice. And rather popular. Gradual Typing

Slide 48

Slide 48 text

Transient Gradual Typing type Vehicle = interface { registration registerTo(_) } type Department = interface { code } var companyCar: Vehicle := object { method registerTo(d: Department) { print "{d.code}" } } companyCar.registerTo( object { var name := "R&D" }) 49 Types are shallow. Method names matter, but arguments don’t. Object only has name, no code method Assignment to registerTo(d: Department) should error

Slide 49

Slide 49 text

Transient Gradual Typing tmp := object { method registerTo(d) { typeCheck d is Department print "{d.code}" } } typeCheck tmp is Vehicle var companyCar = tmp companyCar.registerTo( object { var name := "R&D" }) 50 Very simple semantics. Other Gradual systems have blame, and are more complex Possibly many many checks. Looks very Hard! Gradual Typing

Slide 50

Slide 50 text

How to get rid of these checks without losing run-time semantics ? tmp := object { method registerTo(d) { typeCheck d is Department print "{d.code}" } } typeCheck tmp is Vehicle var companyCar = tmp companyCar.registerTo( object { var name := "R&D" }) 51

Slide 51

Slide 51 text

Shapes to the Rescue 52 Shape 1: foo(int) 2: baz(ptr) 3: float(float) 4: bar(ptr) Shape 1: foo(int) 2: baz(ptr) 3: float(float) 4: bar(ptr) Implicitly Compatible to: - Type 1 - Type 2 1. Check object is compatible 2. Shape implies compatibility

Slide 52

Slide 52 text

Final optimized code tmp := object { method registerTo(d) { check d hasShape s1 print "{d.code}" } } check tmp hasShape s2 var companyCar = tmp companyCar.registerTo( object { var name := "R&D" }) 53 need to do type check only once per lexical location s1.code s2.registerTo JIT Compiler can remove redundant checks

Slide 53

Slide 53 text

Works Well! 54 OOPSLA’17 ECOOP’19

Slide 54

Slide 54 text

Transient Typechecks Are (Almost) Free 55

Slide 55

Slide 55 text

Our Heroes 56 Eliminates Potential Variability Provides a Supporting Structure Makes it Fast With a little Luck Polymorphic Inline Caches Just-in-time Compilation Maps, Hidden Classes, Shapes

Slide 56

Slide 56 text

WARP UP 57

Slide 57

Slide 57 text

Things I didn’t talk about Failure cases: Deoptimization An Efficient Implementation of SELF a Dynamically-Typed Object- Oriented Language Based on Prototypes. Chambers, C., Ungar, D. & Lee, E. (1989). OOPSLA’89 Object shapes are useful for other things Efficient and Thread-Safe Objects for Dynamically- Typed Languages. B. Daloze, S. Marr, D. Bonetta, and H. Mössenböck. OOPSLA'16 58 And many other modern optimizations

Slide 58

Slide 58 text

Our Knights Made it With Some Help of our 90’s Heroes 59 Excellent Performance Reflection Metaprogramming Gradual Typing Metaobject Protocols

Slide 59

Slide 59 text

Key Papers* 60 *Lots of necessary work afterwards, but lay foundations

Slide 60

Slide 60 text

Research and Literature • Efficient Implementation of the Smalltalk-80 System. Deutsch, L. P. & Schiffman, A. M. (1984). POPL’84 • Optimizing Dynamically-Typed Object-Oriented Languages With Polymorphic Inline Caches. Hölzle, U., Chambers, C. & Ungar, D. (1991). ECOOP’91 • Zero-Overhead Metaprogramming: Reflection and Metaobject Protocols Fast and without Compromises. Marr, S., Seaton, C. & Ducasse, S. (2015). PLDI’15 • Optimizing prototypes in V8 https://mathiasbynens.be/notes/ prototypes • https://mathiasbynens.be/notes/ shapes-ics • https://mrale.ph/blog/2012/06/0 3/explaining-js-vms-in-js-inline- caches.html 61

Slide 61

Slide 61 text

Research and Literature • An Efficient Implementation of SELF a Dynamically-Typed Object-Oriented Language Based on Prototypes. Chambers, C., Ungar, D. & Lee, E. (1989). OOPSLA’89 • An Object Storage Model for the Truffle Language Implementation Framework. A. Wöß, C. Wirth, D. Bonetta, C. Seaton, C. Humer, and H. Mössenböck. PPPJ’14. • Storage Strategies for Collections in Dynamically Typed Languages. C. F. Bolz, L. Diekmann, and L. Tratt. OOPSLA’13. • Memento Mori: Dynamic Allocation-site-based Optimizations. Clifford, D., Payer, H., Stanton, M. & Titzer, B. L. (2015). ISMM’15 62

Slide 62

Slide 62 text

Research and Literature • Virtual Machine Warmup Blows Hot and Cold. Barrett, E., Bolz-Tereick, C. F., Killick, R., Mount, S. & Tratt, L. (2017). OOPSLA’17 • Quantifying Performance Changes with Effect Size Confidence Intervals. Kalibera, T. & Jones, R. (2012). Technical Report, University of Kent. • Rigorous Benchmarking in Reasonable Time. Kalibera, T. & Jones, R. (2013). ISMM’13 • How Not to Lie With Statistics: The Correct Way to Summarize Benchmark Results. Fleming, P. J. & Wallace, J. J. (1986). Commun. ACM • SIGPLAN Empirical Evaluation Guidelines https://www.sigplan.org/Resources/E mpiricalEvaluation/ • Systems Benchmarking Crimes, Gernot Heiser https://www.cse.unsw.edu.au/~gern ot/benchmarking-crimes.html • Benchmarking Crimes: An Emerging Threat in Systems Security. van der Kouwe, E., Andriesse, D., Bos, H., Giuffrida, C. & Heiser, G. (2018). arxiv:1801.02381 • http://btorpey.github.io/blog/2014/0 2/18/clock-sources-in-linux/ • Generating an Artefact From a Benchmarking Setup as Part of CI https://stefan- marr.de/2019/05/artifacts-from-ci/ 63