Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

Name @brn (Taketoshi Aono) Occupation Web Frontend Engineer Company Cyberagent.inc Blog http://abcdef.gets.b6n.ch/ Twitter https://twitter.com/brn227 GitHub https://github.com/brn

Slide 3

Slide 3 text

Agenda •  What is V8? •  Execution flow of V8 •  Parsing •  Abstract Syntax Tree •  Ignition – BytecodeInterpreter •  CodeStubAssembler •  Builtins / Runtime •  Optimization / Hidden Class / Inline Caching •  TurboFan / Deoptimization

Slide 4

Slide 4 text

What is V8? V8 is javascript engine that has been developed by Google. It's used as core engine of Google Chrome and Node.JS.

Slide 5

Slide 5 text

Execution Flow

Slide 6

Slide 6 text

Source AST Bytecode Graph Assembly first time hot code

Slide 7

Slide 7 text

Parsing

Slide 8

Slide 8 text

Basic parsing V8 make source code to AST by parsing. AST is abbreviation of Abstract Syntax Tree. Parsing

Slide 9

Slide 9 text

if (x) { x = 100 }

Slide 10

Slide 10 text

IF CONDITION THEN BLOCK EXPRESSION STATEMENT ASSIGN VAR PROXY (X) LITERAL (100) if (x) { x = 100

Slide 11

Slide 11 text

Problems

Slide 12

Slide 12 text

Parsing all functions - Slow Parsing all source code is bad strategy because if parsed code will not executed, it's waste of time. Parsing

Slide 13

Slide 13 text

Split parsing phase So split parsing phase to parse lazily. Parsing

Slide 14

Slide 14 text

PreParsing Parse function layout in advance. Parsing

Slide 15

Slide 15 text

function x(a, b) { return a + b; } FUNCTION(X) parameter-count: 2 start-position: 1 end-position: 34 use-super-property: false …

Slide 16

Slide 16 text

// when x is called x() FUNCTION NAME (x) RETURN LITERAL(1)

Slide 17

Slide 17 text

Lazy Parsing V8 delay parsing until function called in runtime. Function will be compiled only when called. Parsing

Slide 18

Slide 18 text

More About https://docs.google.com/presentation/d/1b- ALt6W01nIxutFVFmXMOyd_6ou_6qqP6S0Prmb1iDs/present? slide=id.p Parsing

Slide 19

Slide 19 text

Abstract Syntax Tree

Slide 20

Slide 20 text

AST Rewriting V8 rewrite AST. Show some examples. AbstractSyntaxTree

Slide 21

Slide 21 text

Subsclass constructor return Transform derived constructor. If it return any value, V8 transform that to ternary operator to return this keyword when return value will become undefined. AbstractSyntaxTree

Slide 22

Slide 22 text

constructor() {! super();! return expr! }! ! constructor() {! super();! var tmp;! return (temp = expr) === undefined?! this: temp;! }!

Slide 23

Slide 23 text

for (let/const/var in/of e) To use const or let in initialization statement in for-of/in statement, V8 move all statement into block. AbstractSyntaxTree

Slide 24

Slide 24 text

for (const key of e) {! ...! }!

Slide 25

Slide 25 text

{! var temp;! for (temp of e) {! const x = temp;! ...! }! let x! }!

Slide 26

Slide 26 text

Spread operator Replaced by do and for-of. AbstractSyntaxTree

Slide 27

Slide 27 text

const x = [1,2,3];! const y = [...x];!

Slide 28

Slide 28 text

do {! $R = [];! for ($i of x)! %AppendElement($R, $i);! $R! }!

Slide 29

Slide 29 text

Ecmascript? – Binary AST AST size is fairly big, so Ecmascript has proposal 'Binary AST'. This proposal proposed to compressed form of AST. Parsing

Slide 30

Slide 30 text

Ignition

Slide 31

Slide 31 text

Bytecode Interpreter V8 execute AST by transform it with bytecode which size will 1 ~ 4 byte. Ignition

Slide 32

Slide 32 text

How does it work? It is one accumulator register based interpreter. Do you understand? I can't :( Ignition

Slide 33

Slide 33 text

Pseudo javascript code So now I show pseudo javascript code which show how Ignition works. Ignition

Slide 34

Slide 34 text

const Bytecodes = [0,1,2,3,4,5];! let index = 0;! function dispatch(next) {BytecodeHandlers[next] ();}! const BytecodeHandlers = [! () => {...; dispatch(Bytecodes[index++])},! () => {...; dispatch(Bytecodes[index++])},! () => {...; dispatch(Bytecodes[index++])},! () => {...; dispatch(Bytecodes[index++])},! () => {...; dispatch(Bytecodes[index++])},! () => {...; dispatch(Bytecodes[index++])},! ]! dispatch(Bytecodes[index++]);!

Slide 35

Slide 35 text

How to create bytecode? Bytecode will created by AstVisitor which is visitor pattern based class that visit AST by Depth-First-Search and callback each AST. Ignition

Slide 36

Slide 36 text

BytecodeArray Bytecode will be stored in BytecodeArray. BytecodeArray exists in each javascript function. Ignition

Slide 37

Slide 37 text

Dispatch Table Stub(Machine Code) BytecodeArray Find and execute corresponding handler of dispatch table. 0 1 3 4 5 6 7 8 5 6 1

Slide 38

Slide 38 text

InterpreterEntryTrampoline Finally created bytecode will invoke from a builtin code that named as InterpreterEntrynTrampoline. InterpreterEntryTrampoline is C laanguage function that written in Assembly. Ignition

Slide 39

Slide 39 text

InterpreterEntryTrampoline(Assembly) Script::Run Call as C function Ignition DispatchTable Dispatch First bytecode

Slide 40

Slide 40 text

Ignition Handler In pseudo javascript code, array named BytecodeHandlers is called as Ignition Handler in V8. Ignition Handler is created by DSL named CodeStubAssembler. Iginition

Slide 41

Slide 41 text

CodeStub Assember

Slide 42

Slide 42 text

What is CodeStubAssmber? CodeStubAssembler(CSA) abstracts code generation to graph creation. It's just only create execution scheduled node, and CodeGenerator convert it to arch dependent code, so you do not need to become expert of assembly language. CodeStubAssembler

Slide 43

Slide 43 text

IGNITION_HANDLER(JumpIfToBooleanFalse, InterpreterAssembler) {! Node* value = GetAccumulator();! // Get Accumulator value.! Node* relative_jump = BytecodeOperandUImmWord(0);! // Get operand value from arguments.! Label if_true(this), if_false(this);! BranchIfToBooleanIsTrue(value, &if_true, &if_false);! // If value will true jump to if_true,! // otherwise jump to if_false.! Bind(&if_true);! Dispatch();! Bind(&if_false);! // Jump to operand bytecode.! Jump(relative_jump);! }!

Slide 44

Slide 44 text

Graph based DSL CodeStubAssembler make code very easy and clean. So it enable add new language functionality fast. CodeStubAssembler

Slide 45

Slide 45 text

Dispatch Table 00 01 02 04 08 0f 10 10 Node Node Node Operator Operator IGNITION_HANDLER Stub (Mahine Code Fragment) Create code from graph. Register code to dispatch table's corresponding index. Assemble

Slide 46

Slide 46 text

Assembler Emit arch dependent code. Let see jmp mnemonic. CodeStubAssembler

Slide 47

Slide 47 text

void Assembler::jmp(! Handle target,! RelocInfo::Mode rmode! ) {! EnsureSpace ensure_space(this);! // 1110 1001 #32-bit disp.! // Emit assembly.! emit(0xE9);! emit_code_target(target, rmode);! }!

Slide 48

Slide 48 text

Where to use The builtins uses Assembler class to write architecture dependent stub. But there are some CSA based code (*-gen.cc). Ignition Handler is almost all written in CSA. CodeStubAssembler

Slide 49

Slide 49 text

Builtins & Runtime

Slide 50

Slide 50 text

Builtins Builtins is collection of assembly code fragment which compiled in V8 initialization. It's called as stub. Runtime optimization is not applied. Builtins & Runtime

Slide 51

Slide 51 text

Runtime Runtime is written in C++ and will be invoked from Builtins or some other assembler code. It's code fragments connect javascript and C++. Not optimized in runtime. Builtins & Runtime

Slide 52

Slide 52 text

Hidden Class

Slide 53

Slide 53 text

What is Hidden Class? Javascript is untyped language, so V8 treat object structure as like type. This called as Hidden Class. Hidden Class

Slide 54

Slide 54 text

•  Hidden Class const point1 = {x: 0, y: 0};! const point2 = {x: 0, y: 0};! Map FixedArray [ {x: {offset: 0}}, {y: {offset: 1}} ]

Slide 55

Slide 55 text

Map If each object is not treat as same in javascript. But if these object has same structure, these share same Hidden Class. That structure data store is called as Map. Hidden Class

Slide 56

Slide 56 text

const pointA = {x: 0, y: 0};! const pointB = {x: 0, y: 0};! // pointA.Map === pointB.Map;! ! const pointC = {y: 0, x: 0};! // pointA.Map !== pointC.Map! ! const point3D = {x: 0, y: 0, z: 0};! // point3D.Map !== pointA.Map!

Slide 57

Slide 57 text

class PointA {! constructor() {! this.x = 0;! this.y = 0;! }! }! const pointAInstance = new PointA();! ! class PointB {! constructor() {! this.y = 0;! this.x = 0;! }! }! const pointBInstance = new PointB();! // PointAInstance.Map !== PointBInstance.Map!

Slide 58

Slide 58 text

Layout Map object checks object layout very strictly, so if literal initialization order, property initialization order or property number is different, allocate other Map. Hidden Class

Slide 59

Slide 59 text

Map Transition But, isn't it pay very large cost to allocate new Map object each time when property changed? So V8 share Map object if property changed, and create new Map which contains new property only. That is called as Map Transition. Hidden Class

Slide 60

Slide 60 text

function Point(x, y) { this.x = x; this.y = y; } Map FixedArray [ {x: {offset: 0}}, {y: {offset: 1}}, ] var x = new Point(1, 1); x.z = 1; Map FixedArray [ {z: {offset: 2}} ] transition {z: transi>on(address)}

Slide 61

Slide 61 text

What's Happening? Why Hidden Class exists? Because it make property access or type checking to more fast and safe. Hidden Class

Slide 62

Slide 62 text

Inline Caching

Slide 63

Slide 63 text

What is Inline Caching Cache accessed object map and offset to speed up property access. Inline Caching

Slide 64

Slide 64 text

function x(obj) {! return obj.x + obj.y;! }! ! x({x: 0, y: 0});! x({x: 1, y: 1});! x({x: 2, y: 2});!

Slide 65

Slide 65 text

Search Property To find property from object, it's need search HashMap or FixedArray. But if executed each time when property accessed, it's very slow. Inline Caching

Slide 66

Slide 66 text

Reduce Property Access In that examples, repeatedly access to x and y of same Map object. If V8 already know obj has {x, y} Map, V8 know memory layout of object. So it's able to access offset directly to speed up. Inline Caching

Slide 67

Slide 67 text

Cache So remember access of specific Map. If V8 accesses any property, it record the Map object and speed up second time property access. Inline Caching

Slide 68

Slide 68 text

x({x: 0, y: 0});! // uninitialized! x({x: 1, y: 1});! // stub_call! x({x: 2, y: 2});! // found ic! x({x: 1, y: 1, z: 1})! // load ic miss! x({x: 1, y: 1, foo() {}});! // load ic miss!

Slide 69

Slide 69 text

Cache Miss Cache miss will be occurred when Map was changed, so new property will be loaded and stored in cache. But it's impossible to record all Map, so max 4 Map will record. Inline Cache

Slide 70

Slide 70 text

Cahce State Cache has below state. -  PreMonomorphic -  Monomorphic -  Polymorphic -  Megamorphic Inline Caching

Slide 71

Slide 71 text

Pre Monomorphic It's shows initialization state. But it's exists only convenience of coding. So it's meaningless for ours. Inline Caching

Slide 72

Slide 72 text

Monomorphic State which exists only one Map. Ideal states. Inline Caching

Slide 73

Slide 73 text

Polymorphic Some Map stored in FixedArray and search these Mpas each time when property accessed. But cache is still enabled, so still fast. Inline Caching

Slide 74

Slide 74 text

Megamorphic To many cache miss hit occurred, V8 stop recording Map. Always call GetProperty function from stub. Very slow state. Inline Caching

Slide 75

Slide 75 text

Optimization

Slide 76

Slide 76 text

Hot or Small Optimizing code every time is very waste of resource. So V8 is optimizing code when below conditions satisfied. -  Function is called (Bytecode length of function / 1200) + 2 times and exhaust budget. -  Function is very small (Bytecode length is less than 90) -  Loops Optimization

Slide 77

Slide 77 text

Optimization Budget Optimization budget is assigned to each functions. If function exhaust that budget, that function becomes candidate of optimization. Optimization

Slide 78

Slide 78 text

For Loop V8 emits JumpLoop bytecode for loop statement. In this JumpLoop bytecode, V8 subtract weight that is offset of backword jump address from budget. If budget becomes less than 0, optimization will occurs. Optimization

Slide 79

Slide 79 text

function id(v) {return v;}! function x(v) {! for (var i = 0; i < 10000; i++) {! id(v + i);! }! }! x(1);!

Slide 80

Slide 80 text

0x1bb9e5e2935e LdaSmi.Wide [1000] 0x1bb9e5e2937e JumpLoop [32], [0] (0x1bb9e5e2935e @ 4) Bytecode length = 100 if (budget –= 100 < 0) { OptimizeAndOSR(); }

Slide 81

Slide 81 text

OSR - OnStackReplacement Optimized code will be installed by replacing jump address. It's called as OSR – OnStackReplacement. Optimization

Slide 82

Slide 82 text

For Function V8 emits Return Bytecode for function. V8 check budget in that Bytecode. Optimization

Slide 83

Slide 83 text

function x() {! const x = 1 + 1;! }! x();!

Slide 84

Slide 84 text

0x3d22953a917a StackCheck 0x3d22953a9180 Return Bytecode length 30 if (budget -= 30 < 0) { OptimizeConcurrent(); }

Slide 85

Slide 85 text

Concurrent Compilation Function optimized concurrently, so next function call might not optimized. Optimization

Slide 86

Slide 86 text

CompilationQueue CompilationJob CompilationJob CompilationJob Hot Function Bytecode Called Hot Function(Queued) Bytecode Called Hot Function(Queued) Bytecode Called Optimized Function Assembly Called

Slide 87

Slide 87 text

const x = x => x;! const y = () => {! for (let i = 0; i < 1000; i++) {! x(i);! }! ! for (let i = 0; i < 1000; i++) {! x(i);! }! };! y();!

Slide 88

Slide 88 text

0x13b567fa924e LdaSmi.Wide [1000] 0x13b567fa9268 JumpLoop [26], [0] (0x13b567fa924e @ 4) Bytecode length 26 budget –= 26 0x13b567fa926e LdaSmi.Wide [1000] 0x13b567fa9288 JumpLoop [26], [0] (0x13b567fa926e @ 36) Bytecode length 26 budget –= 26 0x13b567fa928c Return budget –= all_bytecode_length

Slide 89

Slide 89 text

Budget for function Even if loop is splitted, all budget will be checked in Return Bytecode, so it's optimized very well. Optimization

Slide 90

Slide 90 text

TurboFan

Slide 91

Slide 91 text

What is TurboFan? TurboFan is optimization stack of V8. V8 create IR(Intermediate Representation) from bytecode when optimization. TurboFan create and optimize that IR. TurboFan

Slide 92

Slide 92 text

Bytecode IR TurboFan Optimization & CodeGeneration

Slide 93

Slide 93 text

IR Abstract execution block. It's called as Control Flow Graph TurboFan

Slide 94

Slide 94 text

#22:Branch[None](#21:SpeculativeNumberLessThan, #9:Loop) #28:IfTrue(#22:Branch) #30:JSStackCheck(#11:Phi, #32:FrameState, #21:SpeculativeNumberLessThan, #28:IfTrue) #33:JSLoadGlobal[0x2f3e1c607881 , 1] (#11:Phi, #34:FrameState, #30:JSStackCheck, #30:JSStackCheck) #2:HeapConstant[0x2f3e1c6022e1 ]() #39:FrameState #36:StateValues[sparse:^^](#12:Phi, #33:JSLoadGlobal) #37:FrameState#35:Checkpoint(#37:FrameState, #33:JSLoadGlobal, #33:JSLoadGlobal) #38:JSCall[2, 15256, NULL_OR_UNDEFINED] (#33:JSLoadGlobal, #2:HeapConstant, #11:Phi, #39:FrameState, #35:Checkpoint, #33:JSLoadGlobal) #9:Loop(#0:Start, #38:JSCall)

Slide 95

Slide 95 text

Optimization TurboFan optimize graph. Show some optimization. TurboFan

Slide 96

Slide 96 text

inline Inlining function call. trimming Remove dead node. type Type inference. typed-lowering Replace expr to more simple expr depend on type. loop-peeling Move independent expr to outside of loop.

Slide 97

Slide 97 text

loop-exit-elimination Remove LoopExit. load-elimination Remove useless load and checks. simplified-lowering Simplify operator by more concrete value. generic-lowering Convert js prefixed call to more simple call or stub call. dead-code-elimination Remove dead code.

Slide 98

Slide 98 text

Code generation Finally, InstructionSelector allocates registers, and CodGenerator generate assembly from IR. Optimization

Slide 99

Slide 99 text

Deoptimization

Slide 100

Slide 100 text

What is Deoptimization? Deoptimization mean back to bytecode from machine assembly when unexpected value was passed to assembly code. Of course less Deoptimization is more better. Let's see example. Deoptimization

Slide 101

Slide 101 text

const id = x => x;! const test = obj => {! for (let i = 0; i < 100000; i++) {! id(obj.x);! }! };! ! test({x: 1});! test({x: 1, y: 1});!

Slide 102

Slide 102 text

Wrong Map That examples emit optimized assembly for Map of {x}, But second time test function called by Map of {x, y}. So recompilation occurred. Let's see assembly code a bit. Don't be afraid :) Deoptimization

Slide 103

Slide 103 text

0x451eb30464c 8c 48b9f1c7d830391e0000 REX.W movq rcx, 0x1e3930d8c7f1 0x451eb304656 96 483bca REX.W cmpq rcx,rdx 0x451eb304659 99 0f8591000000 jnz 0x451eb3046f0 ;; Check Map!! ... 0x451eb3046f0 130 e81ff9cfff call 0x451eb004014 ;; deoptimization bailout 2

Slide 104

Slide 104 text

Bailout In this way, emitted code includes Map check code. When deoptimization is occurred, code backs to bytecodes. It's called as Bailout. Deoptimization

Slide 105

Slide 105 text

Summary This is execution and optimization way of javascript in V8. Because of time constraints, GC is omitted. I will write about code reading of V8 to blog. http://abcdef.gets.b6n.ch/ Thank you for your attention :))