Slide 1

Slide 1 text

Runtime Model of Ruby, JavaScript, Erlang, and other popular languages November, 2018 Kyiv 1 / 24

Slide 2

Slide 2 text

Sergii Boiko Full Stack Engineer 2 / 24

Slide 3

Slide 3 text

Why Runtime Model? 3 / 24

Slide 4

Slide 4 text

Runtime Model Key Factors: CPU-bound and IO-bound Key Components: Bare Performance Parallelism Memory Management IO Mode Concurrency Outcomes: Throughput - how many requests we can handle Maximal number of concurrent connections Responsiveness - how predictable response time is 4 / 24

Slide 5

Slide 5 text

CPU-bound 5 / 24

Slide 6

Slide 6 text

CPU-bound CPU-bound - time to complete a task is determined by the speed of the CPU Key Components: Bare Performance Parallelism Memory Management Examples: compiling assets building Ruby during installation resizing images creating ActiveRecord objects 6 / 24

Slide 7

Slide 7 text

CPU-bound: Bare Performance The closer to bare metal - the faster Statically typed languages Dynamic languages AOT compilation: C, C++, Rust Swift Go OCaml Haskell (slower in practice due to laziness) JIT compilation: JVM: Java, Scala, Kotlin .NET: C#, F# JIT: (~2x slower than statically typed) JavaScript V8 Clojure JRuby non-JIT: (~10x slower than statically typed) Erlang/Elixir (JIT in WIP) Ruby MRI (JIT in WIP) Python 7 / 24

Slide 8

Slide 8 text

CPU-bound: Parallelism Parallelism - simultaneous execution of computations. Boils down to using all available CPU cores. Parallel: Rust Go Haskell Erlang/Elixir C, C++ Swift JVM (Java, Scala, Kotlin, Clojure, JRuby) .NET (C#, F#) Non-Parallel: Ruby MRI Python JavaScript/Node.JS OCaml 8 / 24

Slide 9

Slide 9 text

CPU-bound: Memory Management Raw estimation: ~10% performance penalty when using tracing GC Non-GC C (manual) Rust (automatic) C++ (RAII) Reference-Counting Swift (ARC) C++ (smart pointers) Perl 5 Python Tracing Garbage Collector JVM .NET Ruby MRI Python (cycle-collection) JavaScript V8 Erlang/Elixir Haskell OCaml Go 9 / 24

Slide 10

Slide 10 text

IO-bound 10 / 24

Slide 11

Slide 11 text

IO-bound IO-bound: how many "simultaneous" interactions with the outer world can we handle? Key Components: IO Mode Concurrency Memory Management Examples: IRB/Pry input/output Reading le content Handling web request Handling websocket connection Performing database query Calling remote service Reading data from Redis Sending Email 11 / 24

Slide 12

Slide 12 text

IO Mode: Blocking vs Non-Blocking Synchronous or Blocking: waits for the other side to be ready for IO-interaction. Examples: Ruby, Java Asynchronous or Non-Blocking: handles other IO-interactions until the other side is ready to interact. Examples: Node.JS, Python+asyncio 12 / 24

Slide 13

Slide 13 text

Concurrency Concurrency - ability of an application to run several tasks virtually at the same time. Models: 1. Blocking IO + OS Threads/Processes 2. Event Loop 3. Green Threads 13 / 24

Slide 14

Slide 14 text

Blocking IO + OS Threads Runtime blocks on any IO operation, and OS handles switch to another thread. Pros: Easy to write logic per thread - everything is sequential Quite performant - max limit ~ 5000 concurrent threads Memory e cient compared to Processes - everything is shared Cons: Each thread requires 2Mb of memory for a stack Shared state is a big issue for mutable languages Requires usage of di erent thread synchronization primitives High requirements to quality of third-party libraries Examples: JVM .NET C/C++ Rust (type system handles synchronization and shared memory) Ruby MRI (can be, but not used in practice) 14 / 24

Slide 15

Slide 15 text

Blocking IO + OS Processes Runtime blocks on any IO operation, and OS handles switch to another process. Pros: Isolated memory - no risks of simultaneous writes to the same memory Sequential, "blocking" code Cons: Higher memory consumption compared to threads Lower performance compared to asynchronous mode Examples: Unicorn - Ruby web-server Postgres for handling client connections Apache (one of the modes) 15 / 24

Slide 16

Slide 16 text

Event Loop 16 / 24

Slide 17

Slide 17 text

Event Loop Callbacks are put into a queue and executed one by one. Pros: Memory-e cient - shared memory Memory-safe - no race conditions, because only one "callback" is performed High-performant - potentially can handle millions of connections Cons: Callbacks should have small execution time to not block main loop Single-threaded - only one CPU core is used Callback / Promise hell, but can be avoided with coroutines and async/await Examples: Node.JS Python + asyncio or Twisted Ruby + EventMachine 17 / 24

Slide 18

Slide 18 text

Synchronous Asynchronity: Green Threads Instead of using OS threads, runtime has its own scheduler and manages threads without OS. API looks like synchronous But under the hood everything runs asynchronously Bene ts compared to OS Threads: Small memory usage per thread (only ~2Kb for Erlang and Go) Cheap context-switch High-performant - potentially can handle millions of connections Bene ts compared to Event Loop: More simple execution model - sequential "blocking" code Can use all CPU cores Softer restrictions on maximum execution time Examples: Erlang/Elixir (processes/actors and message-passing) Go (goroutines and channels) Haskell (MVar and STM) 18 / 24

Slide 19

Slide 19 text

Memory Management: tracing GC impact on IO 1. Leads to GC "pauses" in execution only Erlang/Elixir and Go have optimizations to prevent/mitigate this issue or use other non-tracing GC runtimes (C, C++, Rust, Swift, Perl 5) 2. There is a maximum heap size, which can be handled e ciently limits the number of maximum connections Erlang/Elixir beats other VMs with ability to manage >100Gb heap size and handle ~2 millions connections Hypothesis: Perl 5 (due to reference-counting) can also handle big heaps 19 / 24

Slide 20

Slide 20 text

Runtime Model of Ruby MRI 2.5 CPU-bound: Dynamic non-JIT (WIP) non-parallel => CPU-bound performance is poor GC: generational mark&sweep - good delays in execution due to GC pauses IO-bound: blocking single-threaded multi-process (Unicorn) => IO-bound stu is not e cient compared to Node.JS, Erlang, or Go - slower > 5-10x 20 / 24

Slide 21

Slide 21 text

Ruby MRI: New Hope - Guilds Guild is a set of Threads and Fibers which can't directly access memory of another Guild. 21 / 24

Slide 22

Slide 22 text

Ruby MRI: New Hope - Guilds Promised to be delivered within one year! Expected performance gain: ~3-10x! Pros: Memory-e cient compared to Processes - immutable stu is shared (code, freezed objects) Memory-safe - di erent Guilds can't simultaneously mutate same object Good enough performance for web-requests Parallel - Guilds don't have GIL Cons: Still can't handle big number of connections Guilds are Threads in disguise, but without a hassle with mutexes Compared to Event Loop or Green Treads: Memory usage is higher Context switch is slower 22 / 24

Slide 23

Slide 23 text

Other factors contributing to Runtime Model Data Structures: mutable or immutable, O(?) GC implementation details Heap and Stack usage Eager or Lazy evaluation Memory Model CPU Architecture Underlying OS system calls(pthreads, select, epoll/kqueue, etc) ... 23 / 24

Slide 24

Slide 24 text

Follow us on twitter @railsware 24 / 24