Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sergii Boiko: "Runtime Model of Ruby, JavaScript, Erlang, and Other Popular Languages"

Railsware
November 03, 2018

Sergii Boiko: "Runtime Model of Ruby, JavaScript, Erlang, and Other Popular Languages"

Recently Sergii Boiko spoke at the Ruby Meditation meetup in Kyiv, Ukraine. He explained how to make a reasonable choice of the proper technology stack for your new projects. Based on his personal experience, Sergii reviewed several popular runtimes like Ruby, Node.js, Python, Erlang/Elixir, JVM, .Net, and Golang.

Railsware

November 03, 2018
Tweet

More Decks by Railsware

Other Decks in Technology

Transcript

  1. Runtime Model Key Factors: CPU-bound and IO-bound Key Components: Bare

    Performance Parallelism Memory Management IO Mode Concurrency Outcomes: Throughput - how many requests we can handle Maximal number of concurrent connections Responsiveness - how predictable response time is 4 / 24
  2. CPU-bound CPU-bound - time to complete a task is determined

    by the speed of the CPU Key Components: Bare Performance Parallelism Memory Management Examples: compiling assets building Ruby during installation resizing images creating ActiveRecord objects 6 / 24
  3. CPU-bound: Bare Performance The closer to bare metal - the

    faster Statically typed languages Dynamic languages AOT compilation: C, C++, Rust Swift Go OCaml Haskell (slower in practice due to laziness) JIT compilation: JVM: Java, Scala, Kotlin .NET: C#, F# JIT: (~2x slower than statically typed) JavaScript V8 Clojure JRuby non-JIT: (~10x slower than statically typed) Erlang/Elixir (JIT in WIP) Ruby MRI (JIT in WIP) Python 7 / 24
  4. CPU-bound: Parallelism Parallelism - simultaneous execution of computations. Boils down

    to using all available CPU cores. Parallel: Rust Go Haskell Erlang/Elixir C, C++ Swift JVM (Java, Scala, Kotlin, Clojure, JRuby) .NET (C#, F#) Non-Parallel: Ruby MRI Python JavaScript/Node.JS OCaml 8 / 24
  5. CPU-bound: Memory Management Raw estimation: ~10% performance penalty when using

    tracing GC Non-GC C (manual) Rust (automatic) C++ (RAII) Reference-Counting Swift (ARC) C++ (smart pointers) Perl 5 Python Tracing Garbage Collector JVM .NET Ruby MRI Python (cycle-collection) JavaScript V8 Erlang/Elixir Haskell OCaml Go 9 / 24
  6. IO-bound IO-bound: how many "simultaneous" interactions with the outer world

    can we handle? Key Components: IO Mode Concurrency Memory Management Examples: IRB/Pry input/output Reading le content Handling web request Handling websocket connection Performing database query Calling remote service Reading data from Redis Sending Email 11 / 24
  7. IO Mode: Blocking vs Non-Blocking Synchronous or Blocking: waits for

    the other side to be ready for IO-interaction. Examples: Ruby, Java Asynchronous or Non-Blocking: handles other IO-interactions until the other side is ready to interact. Examples: Node.JS, Python+asyncio 12 / 24
  8. Concurrency Concurrency - ability of an application to run several

    tasks virtually at the same time. Models: 1. Blocking IO + OS Threads/Processes 2. Event Loop 3. Green Threads 13 / 24
  9. Blocking IO + OS Threads Runtime blocks on any IO

    operation, and OS handles switch to another thread. Pros: Easy to write logic per thread - everything is sequential Quite performant - max limit ~ 5000 concurrent threads Memory e cient compared to Processes - everything is shared Cons: Each thread requires 2Mb of memory for a stack Shared state is a big issue for mutable languages Requires usage of di erent thread synchronization primitives High requirements to quality of third-party libraries Examples: JVM .NET C/C++ Rust (type system handles synchronization and shared memory) Ruby MRI (can be, but not used in practice) 14 / 24
  10. Blocking IO + OS Processes Runtime blocks on any IO

    operation, and OS handles switch to another process. Pros: Isolated memory - no risks of simultaneous writes to the same memory Sequential, "blocking" code Cons: Higher memory consumption compared to threads Lower performance compared to asynchronous mode Examples: Unicorn - Ruby web-server Postgres for handling client connections Apache (one of the modes) 15 / 24
  11. Event Loop Callbacks are put into a queue and executed

    one by one. Pros: Memory-e cient - shared memory Memory-safe - no race conditions, because only one "callback" is performed High-performant - potentially can handle millions of connections Cons: Callbacks should have small execution time to not block main loop Single-threaded - only one CPU core is used Callback / Promise hell, but can be avoided with coroutines and async/await Examples: Node.JS Python + asyncio or Twisted Ruby + EventMachine 17 / 24
  12. Synchronous Asynchronity: Green Threads Instead of using OS threads, runtime

    has its own scheduler and manages threads without OS. API looks like synchronous But under the hood everything runs asynchronously Bene ts compared to OS Threads: Small memory usage per thread (only ~2Kb for Erlang and Go) Cheap context-switch High-performant - potentially can handle millions of connections Bene ts compared to Event Loop: More simple execution model - sequential "blocking" code Can use all CPU cores Softer restrictions on maximum execution time Examples: Erlang/Elixir (processes/actors and message-passing) Go (goroutines and channels) Haskell (MVar and STM) 18 / 24
  13. Memory Management: tracing GC impact on IO 1. Leads to

    GC "pauses" in execution only Erlang/Elixir and Go have optimizations to prevent/mitigate this issue or use other non-tracing GC runtimes (C, C++, Rust, Swift, Perl 5) 2. There is a maximum heap size, which can be handled e ciently limits the number of maximum connections Erlang/Elixir beats other VMs with ability to manage >100Gb heap size and handle ~2 millions connections Hypothesis: Perl 5 (due to reference-counting) can also handle big heaps 19 / 24
  14. Runtime Model of Ruby MRI 2.5 CPU-bound: Dynamic non-JIT (WIP)

    non-parallel => CPU-bound performance is poor GC: generational mark&sweep - good delays in execution due to GC pauses IO-bound: blocking single-threaded multi-process (Unicorn) => IO-bound stu is not e cient compared to Node.JS, Erlang, or Go - slower > 5-10x 20 / 24
  15. Ruby MRI: New Hope - Guilds Guild is a set

    of Threads and Fibers which can't directly access memory of another Guild. 21 / 24
  16. Ruby MRI: New Hope - Guilds Promised to be delivered

    within one year! Expected performance gain: ~3-10x! Pros: Memory-e cient compared to Processes - immutable stu is shared (code, freezed objects) Memory-safe - di erent Guilds can't simultaneously mutate same object Good enough performance for web-requests Parallel - Guilds don't have GIL Cons: Still can't handle big number of connections Guilds are Threads in disguise, but without a hassle with mutexes Compared to Event Loop or Green Treads: Memory usage is higher Context switch is slower 22 / 24
  17. Other factors contributing to Runtime Model Data Structures: mutable or

    immutable, O(?) GC implementation details Heap and Stack usage Eager or Lazy evaluation Memory Model CPU Architecture Underlying OS system calls(pthreads, select, epoll/kqueue, etc) ... 23 / 24