Slide 1

Slide 1 text

A Journey Into Node.js Internals Tamar Twena-Stern

Slide 2

Slide 2 text

Tamar Twena-Stern • Software Engineer - manager and architect • Architect @PaloAltoNetworks • Was a CTO of my own startup • Passionate about Node.js ! • Twitter: @SternTwena

Slide 3

Slide 3 text

Tamar Tena-Stern • On Maternity Leave • Have 3 kids • Loves to play my violin • Javascript Israel community leader

Slide 4

Slide 4 text

Introduction

Slide 5

Slide 5 text

Traditional Approach client thread request request request thread thread Server

Slide 6

Slide 6 text

Problems •The system allocates CPU and memory resources for every new thread •When the system is stressed – overhead of thread scheduling and context switching •The system waste resources for allocating threads instead of doing actual work

Slide 7

Slide 7 text

Node.js Architecture

Slide 8

Slide 8 text

Node.js Architecture - High Level

Slide 9

Slide 9 text

Now, Lets Get Into The Details

Slide 10

Slide 10 text

Single Threaded ? • Not really single threaded • Several threaded : • Event Loop • The workers thread pool

Slide 11

Slide 11 text

Event Loop Thread • Every request registers a callback which executes immediately • The event loop execute JavaScript callbacks • Offloads I/O operations to worker thread pool. • Handle callbacks for asynchronous I/O operations from multiple requests.

Slide 12

Slide 12 text

Worker Thread Pool • Thread pool to perform heavy operations • I/O • CPU intensive operations • Bounded by fixed capacity • A node module can submit a task to libUV API

Slide 13

Slide 13 text

Submitting A Request To The Worker Pool • Use a set of ‘basic’ modules that work with the event loop • Examples: • Fs • Dns • Crypto • And more • Submit a task to libUV using c++ add-on

Slide 14

Slide 14 text

Event Loop Implemented With A Queue ?

Slide 15

Slide 15 text

How Worker Pool Implemented ?

Slide 16

Slide 16 text

• Libuv in a nutshell : – Multi platform C library – Provides support for async I/O based on event loop • Supports : – Epoll (Linux) – Kqueue (OSX) – Windows IOCP – Solaris event ports

Slide 17

Slide 17 text

The Event Loop - The Different Phases

Slide 18

Slide 18 text

Event Loop Phases Overview

Slide 19

Slide 19 text

Phase General Mechanism

Slide 20

Slide 20 text

Timers Phase • setTimeout, setInterval • Timer’s callback will run as soon as they can be scheduled after the threshold • Timer’s callback scheduling controlled by the “poll” phase

Slide 21

Slide 21 text

I/O Callback Phase • Executes system error callbacks • Example : TCP socket connection error. • Normal I/O operation callbacks are executed in the poll phase.

Slide 22

Slide 22 text

Poll phase

Slide 23

Slide 23 text

Check Phase And Close Phase • Check Phase - Execute callbacks for setImmediate timers • Close Phase – Handles an abruptly close of a socket or a handle

Slide 24

Slide 24 text

Lets profile some code

Slide 25

Slide 25 text

The JIT Compiler And V8 Engine

Slide 26

Slide 26 text

What Is Just-In-Time Compilation ? • Compilation during run time • Combines two approaches : • Ahead of compilation • Interpreter

Slide 27

Slide 27 text

JIT Compiler In Java

Slide 28

Slide 28 text

Chrome • Open source JavaScript engine • Developed originally for Google Chrome and chromium • Also used for • Couchbase • MongoDB • Node.js

Slide 29

Slide 29 text

V8 Architecture

Slide 30

Slide 30 text

The JIT Compilation In V8 Source Code Tracing Interpreter Function Repeats - Hot Code JIT Compiler Optimised Code

Slide 31

Slide 31 text

What Is An Optimised Compiler? • When a code is hot – it is worth doing multiple optimisations • Tracing will send it to optimising compiler • Creates an even faster version of the code • Tracing pulls it when the function code runs

Slide 32

Slide 32 text

Ignition Interpreter

Slide 33

Slide 33 text

Ignition Interpreter • Interpreter for V8 • Translates to low level Bytecode • Enabling the following code to be stored more compactly in Bytecode • Run once code • Non hot code

Slide 34

Slide 34 text

Turbofan Compiler

Slide 35

Slide 35 text

Turbofan Compiler • Make hot code run as fast as possible • Relies on input type information collected via inline caches while functions run via the Ignition interpreter. • Generates the best possible code handling the different types it encountered • The fewer function input type variations the compiler has to consider, the smaller and faster the resulting code will be

Slide 36

Slide 36 text

How To Help Turbofan Optimise Hot Code ? •The fewer function input type variations lead to smaller and faster resulting code. •Keeping your functions monomorphic or at least polymorphic •Monomorphic: one input type •Polymorphic: two to four input types •Megamorphic: five or more input types

Slide 37

Slide 37 text

Optimisation And De-optimisation • Optimisation - All assumptions fulfilled – Compiled code runs. • Deoptimisation – Not all assumptions fulfilled – Compiled code erased Assumptions Fulfilled Optimisation Assumptions Break De- Optimisation

Slide 38

Slide 38 text

Avoid De-optimisation • When a code is optimised and de-optimised – it ended up being slower then just use the baseline compiled version • Most browsers and engines will stop trying after several iterations of optimising and de-optimizing

Slide 39

Slide 39 text

De-optimisation Demo

Slide 40

Slide 40 text

V8 Memory Management

Slide 41

Slide 41 text

V8 Memory Structure

Slide 42

Slide 42 text

The Stack • Every executing function pushes its local variables and arguments • Maintains two pointers – Stack Pointer and Base Pointer • Divided into stack frames • No Garbage Collection - self cleaning • Hold the key to the garbage collection process - Starts from active objects that has pointers to the stack.

Slide 43

Slide 43 text

The Heap Structure

Slide 44

Slide 44 text

Generational GC System • Lifetime of objects determines their place on the heap • New Space – short lived Objects • Old Space – long lived Objects • Scan on ‘New Space’ called ‘Scavenge’ • Very fast – takes less then a millisecond • Scan on ‘Old Space’ called ‘Mark Sweep’ • Slower scan

Slide 45

Slide 45 text

Object Life On The Heap Object Allocated On New Space ‘Scavenge’ Scan To Detect Live Objects On New Space Object survives 2 ‘Scavenge’ scans and still alive Moved To ‘Old Space’ More Space Needed On Old Space Mark Sweep Scan Starts to Free Space On Heap

Slide 46

Slide 46 text

Scavenge Scan Explained

Slide 47

Slide 47 text

Mark Sweep For The Following Code

Slide 48

Slide 48 text

Mark Sweep - DFS

Slide 49

Slide 49 text

Mark Sweep - DFS

Slide 50

Slide 50 text

Whats Allocated Where ?

Slide 51

Slide 51 text

Lets Find Some Memory Leaks !

Slide 52

Slide 52 text

• Twitter: @SternTwena