Slide 1

Slide 1 text

HERMES BETTER PERFORMANCE WITH RUNTIME BYTECODE TRANSLATION Tzvetan Mikov, Meta

Slide 2

Slide 2 text

REMEMBER THE SAD TALE OF JOE NATIVE FROM LAST YEAR?

Slide 3

Slide 3 text

HOW DO YOU THINK HE IS DOING NOW, A YEAR LATER? • Probably still not very happy, since we haven’t released Static Hermes yet

Slide 4

Slide 4 text

WALK DOWN MEMORY LANE: WHAT MAKES HERMES DIFFERENT?

Slide 5

Slide 5 text

HERMES JavaScript engine for React Native Optimized for mobile Low runtime resource consumption Extremely fast startup

Slide 6

Slide 6 text

WEB JS ENGINE: SPECULATIVE EXECUTION

Slide 7

Slide 7 text

HERMES PIPELINE: AHEAD OF TIME OPTIMIZATIONS

Slide 8

Slide 8 text

KEY FEATURES AOT compilation to bytecode Optimization happens once, before execution Lightweight runtime Low memory footprint

Slide 9

Slide 9 text

STATIC HERMES BUILDS UPON HERMES • Understands type annotations • Great performance by compiling typed JS code ahead of time • Emits typed bytecode or native machine code • State of the art compiler pipeline • Leverages the best production native compiler: LLVM

Slide 10

Slide 10 text

STATIC HERMES COMPILER PIPELINE

Slide 11

Slide 11 text

STATIC HERMES: LOGISTICAL CHALLENGES • Existing JS build pipelines do not preserve type annotations • Shipping native code makes OTA updates harder • A lot of untyped code still exists. Amdahl’s law: • The performance of a system is limited by untyped JavaScript.

Slide 12

Slide 12 text

STATIC HERMES: LOGISTICAL CHALLENGES • Existing JS build pipelines do not preserve type annotations • Shipping native code makes OTA updates harder • A lot of untyped code still exists. Amdahl’s law: • The performance of a system is limited by untyped JavaScript. • The performance of a system is limited by its slowest part.

Slide 13

Slide 13 text

OUR SOLUTION: BYTECODE TRANSLATION ON DEVICE

Slide 14

Slide 14 text

BYTECODE TRANSLATION ON DEVICE • Bytecode is translated to machine instructions at runtime • Ship bytecode like we do today; OTA updates work • Improved untyped performance • Excellent typed performance

Slide 15

Slide 15 text

WAIT, ISN’T THAT … A JIT?

Slide 16

Slide 16 text

WAIT, ISN’T THAT … A JIT? • Technically, yes • In JavaScript “JIT” tends to mean a very complex speculative runtime compiler • Bytecode translation is very lightweight by comparison • Designed for the Hermes AOT pipeline

Slide 17

Slide 17 text

BYTECODE TRANSLATION

Slide 18

Slide 18 text

BYTECODE TRANSLATION Bytecode translation

Slide 19

Slide 19 text

PERFORMANCE RESULTS

Slide 20

Slide 20 text

0 0.5 1 1.5 Box2D Crypto Gameboy Navier-stokes Richards N-body TS Raytracer Untyped JS Benchmarks Hermes 2023 Hermes 2024

Slide 21

Slide 21 text

0 2 4 6 8 10 12 14 Raytracer nbody Typed Benchmarks Hermes Untyped Typed Native

Slide 22

Slide 22 text

RENDERING A MANDELBROT SET SOMETIMES THE NEW PERFORMANCE IS JUST FUN https://github.com/tmikov/mandelbrot-demo

Slide 23

Slide 23 text

Hermes 2023 206 ms/frame Hermes 2024 with bytecode translation 44 ms/frame

Slide 24

Slide 24 text

DEEP DIVE: HOW DOES IT WORK? A brief tutorial on building a Hermes JIT (but seriously, do not do this at home!)

Slide 25

Slide 25 text

IN THE BEGINNING: HERMES BYTECODE Function(2 params, 2 registers): LoadParam r0, 1 GetByIdShort r1, r0, 1, "prop" LoadConstUInt8 r0, 100 Mul r1, r1, r0 LoadConstUInt8 r0, 1 SubN r0, r1, r0 Ret r0 function getprop(o) { return (o.prop * 100) - 1; }

Slide 26

Slide 26 text

HERMES BYTECODE Function(2 params, 2 registers): LoadParam r0, 1 GetByIdShort r1, r0, 1, "prop" LoadConstUInt8 r0, 100 Mul r1, r1, r0 LoadConstUInt8 r0, 1 SubN r0, r1, r0 Ret r0 function getprop(o) { return (o.prop * 100) - 1; }

Slide 27

Slide 27 text

HERMES BYTECODE Function(2 params, 2 registers): LoadParam r0, 1 GetByIdShort r1, r0, 1, "prop" LoadConstUInt8 r0, 100 Mul r1, r1, r0 LoadConstUInt8 r0, 1 SubN r0, r1, r0 Ret r0 function getprop(o) { return (o.prop * 100) - 1; }

Slide 28

Slide 28 text

HERMES BYTECODE Function(2 params, 2 registers): LoadParam r0, 1 GetByIdShort r1, r0, 1, "prop" LoadConstUInt8 r0, 100 Mul r1, r1, r0 LoadConstUInt8 r0, 1 SubN r0, r1, r0 Ret r0 function getprop(o) { return (o.prop * 100) - 1; }

Slide 29

Slide 29 text

Function LoadParam r0, 1 GetByIdShort r1, r0, 1, "prop” LoadConstUInt8 r0, 100 Mul r1, r1, r0 LoadConstUInt8 r0, 1 SubN r0, r1, r0 mov x0, x19 mov w1, 1 bl _sh_ljs_param str x0, [x20, 16] mov x0, x19 add x1, x20, 16 mov w2, 658 ldr x3, [RO_DATA] add x3, x3, 16 bl _sh_ljs_get_by_id_rjs str x0, [x20, 24] mov x1, 100 str x1, [x20, 16] mov x0, x19 add x1, x20, 24 add x2, x20, 16 bl _sh_ljs_mul_rjs fmov d1, x0 fmov d0, 1 fsub d0, d1, d0 function call function call function call

Slide 30

Slide 30 text

Function LoadParam r0, 1 GetByIdShort r1, r0, 1, "prop” LoadConstUInt8 r0, 100 Mul r1, r1, r0 LoadConstUInt8 r0, 1 SubN r0, r1, r0 mov x0, x19 mov w1, 1 bl _sh_ljs_param str x0, [x20, 16] mov x0, x19 add x1, x20, 16 mov w2, 658 ldr x3, [RO_DATA] add x3, x3, 16 bl _sh_ljs_get_by_id_rjs str x0, [x20, 24] mov x1, 100 str x1, [x20, 16] mov x0, x19 add x1, x20, 24 add x2, x20, 16 bl _sh_ljs_mul_rjs fmov d1, x0 fmov d0, 1 fsub d0, d1, d0 function call function call function call

Slide 31

Slide 31 text

Function LoadParam r0, 1 GetByIdShort r1, r0, 1, "prop” LoadConstUInt8 r0, 100 Mul r1, r1, r0 LoadConstUInt8 r0, 1 SubN r0, r1, r0 mov x0, x19 mov w1, 1 bl _sh_ljs_param str x0, [x20, 16] mov x0, x19 add x1, x20, 16 mov w2, 658 ldr x3, [RO_DATA] add x3, x3, 16 bl _sh_ljs_get_by_id_rjs str x0, [x20, 24] mov x1, 100 str x1, [x20, 16] mov x0, x19 add x1, x20, 24 add x2, x20, 16 bl _sh_ljs_mul_rjs fmov d1, x0 fmov d0, 1 fsub d0, d1, d0 function call function call function call

Slide 32

Slide 32 text

Function LoadParam r0, 1 GetByIdShort r1, r0, 1, "prop” LoadConstUInt8 r0, 100 Mul r1, r1, r0 LoadConstUInt8 r0, 1 SubN r0, r1, r0 mov x0, x19 mov w1, 1 bl _sh_ljs_param str x0, [x20, 16] mov x0, x19 add x1, x20, 16 mov w2, 658 ldr x3, [RO_DATA] add x3, x3, 16 bl _sh_ljs_get_by_id_rjs str x0, [x20, 24] mov x1, 100 str x1, [x20, 16] mov x0, x19 add x1, x20, 24 add x2, x20, 16 bl _sh_ljs_mul_rjs fmov d1, x0 fmov d0, 1 fsub d0, d1, d0 function call function call function call

Slide 33

Slide 33 text

Function LoadParam r0, 1 GetByIdShort r1, r0, 1, "prop” LoadConstUInt8 r0, 100 Mul r1, r1, r0 LoadConstUInt8 r0, 1 SubN r0, r1, r0 mov x0, x19 mov w1, 1 bl _sh_ljs_param str x0, [x20, 16] mov x0, x19 add x1, x20, 16 mov w2, 658 ldr x3, [RO_DATA] add x3, x3, 16 bl _sh_ljs_get_by_id_rjs str x0, [x20, 24] mov x1, 100 str x1, [x20, 16] mov x0, x19 add x1, x20, 24 add x2, x20, 16 bl _sh_ljs_mul_rjs fmov d1, x0 fmov d0, 1 fsub d0, d1, d0 function call function call function call

Slide 34

Slide 34 text

WE CAN DO EVEN BETTER • There were lots of function calls • Function calls can be relatively expensive • JS is a funny language • Almost everything is valid. But: • Frequent patterns are cheap • Uncommon ones are expensive

Slide 35

Slide 35 text

WE CAN DO EVEN BETTER • There were lots of function calls • Function calls can be relatively expensive • JS is a funny language • Almost everything is valid. But: • Frequent patterns are cheap • Uncommon ones are expensive 123 * 456 "Joe Native" * [2,3]

Slide 36

Slide 36 text

FAST AND SLOW PATHS • Split expensive operation in two parts: • A slow path function call for all complicated and weird cases (“string” * [1,2,3]) • A fast path for the simple and fast cases (123 * 456)

Slide 37

Slide 37 text

Mul r1, r1, r0 str x0, [x20, 24] str x1, [x20, 16] cmp x0, x21 b.hs SLOW_1 fmov d0, x1 fmov d1, x0 fmul d1, d1, d0 CONT_1: ... SLOW_1: mov x0, x19 add x1, x20, 24 add x2, x20, 16 bl _sh_ljs_mul_rjs fmov d1, x0 b CONT_1 FAST AND SLOW PATHS CHECK FAST PATH SLOW PATH

Slide 38

Slide 38 text

LoadConstUInt8 r0, 100 Mul r1, r1, r0 LoadConstUInt8 r0, 1 SubN r0, r1, r0 mov x1, 100 str x0, [x20, 24] str x1, [x20, 16] cmp x0, x21 b.hs SLOW_1 fmov d0, x1 fmov d1, x0 fmul d1, d1, d0 fmov d0, 1.0 fsub d0, d1, d0 COMMON EXECUTION TRACE No calls!

Slide 39

Slide 39 text

STATIC HERMES • Static Hermes understands type annotations • Emits bytecode instructions that know the types of their operands • Typed bytecode results in much faster machine instruction sequences

Slide 40

Slide 40 text

HERMES TYPED BYTECODE Function: LoadParam r0, 1 GetOwnBySlotIdx r1, r0, 0 LoadConstUInt8 r0, 100 MulN r1, r1, r0 LoadConstUInt8 r0, 1 SubN r0, r1, r0 Ret r0 type Obj = {prop: number}; function getprop(o: Obj): number { return (o.prop * 100) - 1; }

Slide 41

Slide 41 text

TYPED VS UNTYPED BYTECODE LoadParam r0, 1 GetByIdShort r1, r0, 1, "prop" LoadConstUInt8 r0, 100 Mul r1, r1, r0 LoadConstUInt8 r0, 1 SubN r0, r1, r0 Ret r0 LoadParam r0, 1 GetOwnBySlotIdx r1, r0, 0 LoadConstUInt8 r0, 100 MulN r1, r1, r0 LoadConstUInt8 r0, 1 SubN r0, r1, r0 Ret r0 Typed Untyped

Slide 42

Slide 42 text

Function LoadParam r0, 1 GetOwnBySlotIdx r1, r0, 0 LoadConstUInt8 r0, 100 MulN r1, r1, r0 LoadConstUInt8 r0, 1 SubN r0, r1, r0 ldur x0, [x20, -72] mov x1, x0 and x1, x1, 0x0000ffffffffffff ldr x1, [x1, 48] mov x0, 100.0 fmov d0, x1 fmov d1, x0 fmul d0, d0, d1 fmov d1, 1.0 fsub d1, d0, d1

Slide 43

Slide 43 text

TAKEAWAYS • Moderate speed ups for untyped code • Can be used for existing code and npm modules • Great speed ups for when using static types • React Native will use it for framework hot code • Developers could optionally use types to speed up non framework code

Slide 44

Slide 44 text

HERMES V2: BUT WHEN? • When will all of this be enabled in RN by default • As usual, everything is available on our GitHub • We follow a process where we release to RN after we have tested things internally

Slide 45

Slide 45 text

HERMES V2: BUT WHEN? • In order of sooner to later • Better language support (classes, etc) • Bytecode Translation • Static types

Slide 46

Slide 46 text

THANK YOU! • https://github.com/facebook/hermes