Upgrade to Pro — share decks privately, control downloads, hide ads and more …

RuPy Campinas 2016 - Designing concurrent appli...

RuPy Campinas 2016 - Designing concurrent applications

Renan Ranelli

June 18, 2016
Tweet

More Decks by Renan Ranelli

Other Decks in Programming

Transcript

  1. AGENDA • Why concurrency matters • The “stages” of applications

    regarding concurrency and distribution • Concurrency & distribution is hard • How Ruby deals with it • Elixir & Erlang introduction • How Erlang-land stuff impacts the way we design such systems • Conclusions
  2. DISCLAIMER • This talk will give you more questions than

    answers • I absolutely won’t tell you “what to do” • I will assume “some” knowledge with the backend stuff discussed today • I will not show you a {case,example} so that we will have more time to talk about stuff
  3. WHATS HAPPENING IN HARDWARE-LAND? In summary: –CPU clock is not

    getting faster exponentially –But we are still getting sponentially more transitors (more cores!) –No substantial gains in sequential performance* –Concurrency is the next major revolution in how we write software
  4. Em 2005: –Java 5 was hot news. –Windows XP. Vista

    only in 2007. –No AWS, Twitter, Netflix. –Ruby on Rails 1.0 only in december. –Youtube was just founded. –You never heard of Justing Bieber –NO STACK OVERFLOW.
  5. PROGRAMAÇÃO FUNCIONAL • Every* programming language created since then had

    *concurrency* as a major focus: –Scala, Clojure –Elixir –Go, Rust –Kotlin, Nim, Pony, Crystal, etc… –… Erlang and Haskells rediscovery
  6. THE STAGES OF OUR APPS • 1 - Very low

    load application (10s of page views/sec) • 2 - Small, has users. Now we need to take some things to the background • 3 – Some complex and “long” async workflows. A *big* machine still solves our issues • 4 – Tons of machines, apps, developers and multiple data stores. Requires lots of coordination and distribution. Real-time & stateful stuff. A “real backend”
  7. THE STAGES OF OUR APPS • 1 - Very low

    load application (10s of page views/sec) • 2 - Small, has users. Now we need to take some things to the background • 3 – Some complex and “long” async workflows. A *big* machine still solves our issues • 4 – Tons of machines, apps, developers and multiple data stores. Requires lots of coordination and distribution. Real-time & stateful stuff. A “real backend” THIS IS HELL HARD
  8. WHERE DO WE NEED TO DEAL WITH IT? • You

    need to be aware of concurrency in the data tier • Reasoning about “Time” and ordering suck. • System operations is hard.
  9. THE DATA TIER IN A DISTRIBUTED SYSTEM • Your default

    database configuration probably won’t shield you against race conditions (!)
  10. THE DATA TIER IN A DISTRIBUTED SYSTEM • Your default

    database configuration probably won’t shield you against race conditions (!)
  11. THE DATA TIER IN A DISTRIBUTED SYSTEM • MongoDB and

    its isolation levels. • Mongo basically screws up every guarantee it claims to give you. Kyle Kingsbury (@aphyr) tells you not to use it. See the posts below for more info: https://aphyr.com/posts/322-jepsen-mongodb-stale-reads https://aphyr.com/posts/284-jepsen-mongodb
  12. THE DATA TIER IN A DISTRIBUTED SYSTEM • When dealing

    with Amazon’s S3 • It is safe to “read after create”. • It is not safe to “read after update”. Guess who got a production issue because of that ….
  13. TIME IS NOT WHAT YOU EXPECT • Clocks drift (hece

    vector clocks) • Locks are hard. Counters are hard.
  14. TIME IS NOT WHAT YOU EXPECT • Two things are

    hard about distributed systems: • 2. ensure messages are “delivered once” • 1. ensure messages are “delivered in order” • 2. ensure messages are “delivered once” • That’s why Distributed Locks & Counters are hard.
  15. OPERATION IS HARD • Failure is all too frequent. •

    Metrics, alerts, log aggregation are a must-have. • You need to handle back-pressure & timeouts when integrating systems • Releasing a new version is *not* instantaneous. You gotta deal with different live versions at the same time. • You need to be able to easily “inspect” the whole- system when debugging. Ssh-ing individual nodes won’t cut it.
  16. RUBY (MRI) & CONCURRENCY • Ruby’s runtime is *slow*. •

    The GIL blocks any chance of achieving “parallelism” in Ruby, which hinders its ability to scale “vertically”. • GLOBAL SHARED STATE EVERYWHERE (!)
  17. RUBY (MRI) & CONCURRENCY • There are tools like Celluloid,

    Concurrent-ruby, etc that are able to help you *a lot* when writing concurrent. No memory model and portability issues. • I argue that they only “marginally extend” the spectrum of problems in which we are able to handle with Ruby. Ruby 3.0 _might_ make the situation better, but we’re far from it. • **IT IS** definitely possible to scale Ruby to tremendous loads. See Shopify as an example. • Ruby’s design goals are very different than the ones of Elixir & Erlang.
  18. • Elixir in production for 9 months: 2k commits, 300

    PRs, +- 21k lines of code, 4+1 devs • ~ 150 modules
  19. • Elixir • Based on the Erlang VM. Can handle

    a lot of punching. • Erlang's runtime is extremely mature and battle-hardened. • Fault tolerance is a first class citizen. Hot-code reloads are possible. Many versions of the same module can coexist. • Erlang is built to yield uptimes up to 99.9999999% (really) • Communication is “shared nothing” and “default asynchronous”. • This is also called the “Actor System”.
  20. ELIXIR • Everything is immutable. • There is a compiler

    & Macros (!*). • Elixir is pragmatic (!!!) • Performance is great. GC happens per-process. (*) • Very small latency variance (!!) • Documentation & tooling are take seriously. (*) • The runtime & OTP are AWESOME. Every process is preempted. No bad neighbors. SO CHEAP (~4k overhead)
  21. ELIXIR • State isolation & immutability allow you to think

    in terms of “bindings” and “values” instead of “addresses” and “memory”. • There is “one true way” to achieve coordination. And since nothing is shared you *don’t need to think about memory access patterns (!!!!)*
  22. This is what I like about Erlang fault-tolerance approach. There

    are various options with strong guarantees. You can isolate crashes, but you can also connect failures if needed. Some scenarios may require more work, but the implementation is still straightforward. Supporting these scenarios without process isolation and crash propagation would be harder and you might end up reinventing parts of Erlang. Sasa Juric
  23. ELIXIR & ERLANG OTP • OTP brings you *standard* ways

    to package, start, inspect, stop, upgrade and debug applications *in production*. (with some discipline & configuration, we can solve many of the same problems microservices solve) • Backpressure is built-in in many places • The virtual machine provides you with a **ton** of metrics. It is trivial and non-intrusive to collect those. • Tracing is “cheap” and built-in. Think of “log on steroids”
  24. ELIXIR & ERLANG OTP • RECON gives you an insane

    amount of information: • Memory allocations • Queue sizes • Scheduler utilization • Process states • Crash-dump analysis • Tracing (!) • etc
  25. ELIXIR & ERLANG OTP • RECON gives you an insane

    amount of information: • Memory allocations • Queue sizes • Scheduler utilization • Process states • Crash-dump analysis • etc
  26. ELIXIR & ERLANG OTP • Distribution is transparent. See Joe

    Armstrong’s post about the “universal server”: http://joearms.github.io/2013/11/21/My-favorite-erlang-program.html
  27. ELIXIR & ERLANG OTP • The Phoenix team was able

    to handle 2M web sockets in a single box without fancy kernel optimizations. Maxed out on the limit of open files. (100k new conns/sec) • http://www.phoenixframework.org/blog/the-road-to-2- million-websocket-connections
  28. ELIXIR & ERLANG OTP • Standardization avoids fragmentation & lets

    you build much more *awesome tooling* (that’s what makes rails great, right!?)
  29. PROBLEMS WE ENCOUNTERED SO FAR • The ecosystem is still

    very small (and somewhat buggy). • No easy & quick-win options for monitoring apps (like new relic, appsignal, honeybadger). • Nothing as mature as Sidekiq (we have Exq and verk, but still...).
  30. PROBLEMS WE ENCOUNTERED SO FAR • The ecosystem is still

    very small (and kinda buggy). • No usable client for elastic search (*) • No usable (at the time?) library for exposing jsonapi (*) • No usable (at the time) library to handle auth (*) • No usable bindings for GraphicsMagick (*) • Releases are so damn hard and un-12factor (*) (config) • Hot code reloads are much much harder than people say • Almost no problems with core libs like Ecto, Plug & Phoenix though.
  31. CONCLUSION • Elixir is *very promissing*, and has learned a

    lot from other communities. It fundamentally changes the way we think and architect systems • You can see that it evolves and taps into the learning experiences of other languages & communities. It still lacks a mature ecosystem but it is gaining traction fast. • If you truly aim to invest yourself in it, you must be ready to get your hands dirty and write some infrastructure you take for granted in other ecosystems. • Do *not* underestimate the complexity of managing library code + tests + docs + versioning + bug tracking.
  32. CONCLUSION • THERE ARE other ways to solve all the

    issues I talked about here. {Erlang,Elixir} is *not* the only kid in the block … (see how many data-infrastructure projects are java based) • However, it is common to add *a lot* of extra infrastructure and moving parts in alternatives.
  33. CONCLUSION If you can solve “fault tolerance” & “distribution”, you

    have already solved “concurrency”& “scalability”.