RuPy Campinas 2016 - Designing concurrent applications

RuPy Campinas 2016 - Designing concurrent applications

4569aec00cb223b3fbf484f9e7ba1256?s=128

Renan Ranelli

June 18, 2016
Tweet

Transcript

  1. DESIGNING CONCURRENT SYSTEMS: A COMPARATIVE APPROACH Renan Ranelli

  2. None
  3. None
  4. None
  5. ELIXIR @ XERPA

  6. ELIXIR @ XERPA

  7. ESTAMOS CONTRATANDO (principalmente se vc manjar frontend)

  8. AGENDA • Why concurrency matters • The “stages” of applications

    regarding concurrency and distribution • Concurrency & distribution is hard • How Ruby deals with it • Elixir & Erlang introduction • How Erlang-land stuff impacts the way we design such systems • Conclusions
  9. DISCLAIMER • This talk will give you more questions than

    answers • I absolutely won’t tell you “what to do” • I will assume “some” knowledge with the backend stuff discussed today • I will not show you a {case,example} so that we will have more time to talk about stuff
  10. WHY CONCURRENCY MATTERS

  11. HARDWARE Goggle the article “free lunch is over” and read

    it. Seriously.
  12. None
  13. WHATS HAPPENING IN HARDWARE-LAND? In summary: –CPU clock is not

    getting faster exponentially –But we are still getting sponentially more transitors (more cores!) –No substantial gains in sequential performance* –Concurrency is the next major revolution in how we write software
  14. > Efficiency and performance optimization will get more, not less,

    important
  15. This was written in March 2005.

  16. 2005

  17. Em 2005: –Java 5 was hot news. –Windows XP. Vista

    only in 2007. –No AWS, Twitter, Netflix. –Ruby on Rails 1.0 only in december. –Youtube was just founded. –You never heard of Justing Bieber –NO STACK OVERFLOW.
  18. This was written in March 2005

  19. FUNCTIONAL PROGRAMMING (!)

  20. PROGRAMAÇÃO FUNCIONAL • Every* programming language created since then had

    *concurrency* as a major focus: –Scala, Clojure –Elixir –Go, Rust –Kotlin, Nim, Pony, Crystal, etc… –… Erlang and Haskells rediscovery
  21. https://www.youtube.com/watch?v=njAMVB02Ag0 https://www.youtube.com/watch?v=kiaZd8dmbtI EXISTE VIDA ALÉM DE OOP IF YOU WANT

    TO KNOW MORE ABOUT THESE THEMES:
  22. HOW OUR APPS EVOLVE INTO THE NEED FOR CONCURRENCY

  23. THE STAGES OF OUR APPS • 1 - Very low

    load application (10s of page views/sec) • 2 - Small, has users. Now we need to take some things to the background • 3 – Some complex and “long” async workflows. A *big* machine still solves our issues • 4 – Tons of machines, apps, developers and multiple data stores. Requires lots of coordination and distribution. Real-time & stateful stuff. A “real backend”
  24. THE STAGES OF OUR APPS • 1 - Very low

    load application (10s of page views/sec) • 2 - Small, has users. Now we need to take some things to the background • 3 – Some complex and “long” async workflows. A *big* machine still solves our issues • 4 – Tons of machines, apps, developers and multiple data stores. Requires lots of coordination and distribution. Real-time & stateful stuff. A “real backend” THIS IS HELL HARD
  25. CONCURRENCY & DISTRIBUTION IS **HARD**

  26. WHERE DO WE NEED TO DEAL WITH IT? • You

    need to be aware of concurrency in the data tier • Reasoning about “Time” and ordering suck. • System operations is hard.
  27. THE DATA TIER IN A DISTRIBUTED SYSTEM • Your default

    database configuration probably won’t shield you against race conditions (!)
  28. THE DATA TIER IN A DISTRIBUTED SYSTEM • Your default

    database configuration probably won’t shield you against race conditions (!)
  29. THE DATA TIER IN A DISTRIBUTED SYSTEM • MongoDB and

    its isolation levels. • Mongo basically screws up every guarantee it claims to give you. Kyle Kingsbury (@aphyr) tells you not to use it. See the posts below for more info: https://aphyr.com/posts/322-jepsen-mongodb-stale-reads https://aphyr.com/posts/284-jepsen-mongodb
  30. THE DATA TIER IN A DISTRIBUTED SYSTEM • When dealing

    with Amazon’s S3 • It is safe to “read after create”. • It is not safe to “read after update”. Guess who got a production issue because of that ….
  31. TIME IS NOT WHAT YOU EXPECT • Clocks drift (hece

    vector clocks) • Locks are hard. Counters are hard.
  32. TIME IS NOT WHAT YOU EXPECT • Two things are

    hard about distributed systems: • 2. ensure messages are “delivered once” • 1. ensure messages are “delivered in order” • 2. ensure messages are “delivered once” • That’s why Distributed Locks & Counters are hard.
  33. OPERATION IS HARD • Failure is all too frequent. •

    Metrics, alerts, log aggregation are a must-have. • You need to handle back-pressure & timeouts when integrating systems • Releasing a new version is *not* instantaneous. You gotta deal with different live versions at the same time. • You need to be able to easily “inspect” the whole- system when debugging. Ssh-ing individual nodes won’t cut it.
  34. OPERATION IS HARD • IT IS YOUR RESPONSIBILITY TO DESIGN

    & BUILD AN OPERABLE SYSTEM (!!!)
  35. HOW RUBY FARES IN THIS SCENARIO

  36. RUBY (MRI) & CONCURRENCY • Ruby’s runtime is *slow*. •

    The GIL blocks any chance of achieving “parallelism” in Ruby, which hinders its ability to scale “vertically”. • GLOBAL SHARED STATE EVERYWHERE (!)
  37. None
  38. RUBY (MRI) & CONCURRENCY • There are tools like Celluloid,

    Concurrent-ruby, etc that are able to help you *a lot* when writing concurrent. No memory model and portability issues. • I argue that they only “marginally extend” the spectrum of problems in which we are able to handle with Ruby. Ruby 3.0 _might_ make the situation better, but we’re far from it. • **IT IS** definitely possible to scale Ruby to tremendous loads. See Shopify as an example. • Ruby’s design goals are very different than the ones of Elixir & Erlang.
  39. ELIXIR & ERLANG

  40. • Elixir in production for 9 months: 2k commits, 300

    PRs, +- 21k lines of code, 4+1 devs • ~ 150 modules
  41. • Elixir • Based on the Erlang VM. Can handle

    a lot of punching. • Erlang's runtime is extremely mature and battle-hardened. • Fault tolerance is a first class citizen. Hot-code reloads are possible. Many versions of the same module can coexist. • Erlang is built to yield uptimes up to 99.9999999% (really) • Communication is “shared nothing” and “default asynchronous”. • This is also called the “Actor System”.
  42. None
  43. ELIXIR • Everything is immutable. • There is a compiler

    & Macros (!*). • Elixir is pragmatic (!!!) • Performance is great. GC happens per-process. (*) • Very small latency variance (!!) • Documentation & tooling are take seriously. (*) • The runtime & OTP are AWESOME. Every process is preempted. No bad neighbors. SO CHEAP (~4k overhead)
  44. ELIXIR • State isolation & immutability allow you to think

    in terms of “bindings” and “values” instead of “addresses” and “memory”. • There is “one true way” to achieve coordination. And since nothing is shared you *don’t need to think about memory access patterns (!!!!)*
  45. None
  46. State is privately held here

  47. State is privately held here These fail together

  48. These “compose”

  49. None
  50. None
  51. This is what I like about Erlang fault-tolerance approach. There

    are various options with strong guarantees. You can isolate crashes, but you can also connect failures if needed. Some scenarios may require more work, but the implementation is still straightforward. Supporting these scenarios without process isolation and crash propagation would be harder and you might end up reinventing parts of Erlang. Sasa Juric
  52. ELIXIR & ERLANG OTP • OTP brings you *standard* ways

    to package, start, inspect, stop, upgrade and debug applications *in production*. (with some discipline & configuration, we can solve many of the same problems microservices solve) • Backpressure is built-in in many places • The virtual machine provides you with a **ton** of metrics. It is trivial and non-intrusive to collect those. • Tracing is “cheap” and built-in. Think of “log on steroids”
  53. ELIXIR & ERLANG OTP • BACKPRESSURE IS SOMETHING YOU CAN’T

    IGNORE
  54. ELIXIR & ERLANG OTP • BACKPRESSURE IS SOMETHING YOU CAN’T

    IGNORE
  55. ELIXIR & ERLANG OTP • BACKPRESSURE IS SOMETHING YOU CAN’T

    IGNORE
  56. ELIXIR & ERLANG OTP • BACKPRESSURE IS SOMETHING YOU CAN’T

    IGNORE
  57. ELIXIR & ERLANG OTP • RECON gives you an insane

    amount of information: • Memory allocations • Queue sizes • Scheduler utilization • Process states • Crash-dump analysis • Tracing (!) • etc
  58. ELIXIR & ERLANG OTP • RECON gives you an insane

    amount of information: • Memory allocations • Queue sizes • Scheduler utilization • Process states • Crash-dump analysis • etc
  59. ELIXIR & ERLANG OTP • Distribution is transparent. See Joe

    Armstrong’s post about the “universal server”: http://joearms.github.io/2013/11/21/My-favorite-erlang-program.html
  60. ELIXIR & ERLANG OTP • The Phoenix team was able

    to handle 2M web sockets in a single box without fancy kernel optimizations. Maxed out on the limit of open files. (100k new conns/sec) • http://www.phoenixframework.org/blog/the-road-to-2- million-websocket-connections
  61. ELIXIR & ERLANG OTP • Standardization avoids fragmentation & lets

    you build much more *awesome tooling* (that’s what makes rails great, right!?)
  62. NICE! BUT WHAT ABOUT THE CURRENT STATE OF THE ECOSYSTEM?

  63. PROBLEMS WE ENCOUNTERED SO FAR • The ecosystem is still

    very small (and somewhat buggy). • No easy & quick-win options for monitoring apps (like new relic, appsignal, honeybadger). • Nothing as mature as Sidekiq (we have Exq and verk, but still...).
  64. PROBLEMS WE ENCOUNTERED SO FAR • The ecosystem is still

    very small (and kinda buggy). • No usable client for elastic search (*) • No usable (at the time?) library for exposing jsonapi (*) • No usable (at the time) library to handle auth (*) • No usable bindings for GraphicsMagick (*) • Releases are so damn hard and un-12factor (*) (config) • Hot code reloads are much much harder than people say • Almost no problems with core libs like Ecto, Plug & Phoenix though.
  65. YOU GOTTA GET YOUR HANDS DIRTY

  66. YOU GOTTA GET YOUR HANDS DIRTY

  67. YOU GOTTA GET YOUR HANDS DIRTY

  68. YOU GOTTA GET YOUR HANDS DIRTY

  69. YOU GOTTA GET YOUR HANDS DIRTY

  70. YOU GOTTA GET YOUR HANDS DIRTY

  71. YOU GOTTA GET YOUR HANDS DIRTY

  72. CONCLUSION

  73. CONCLUSION • Elixir is *very promissing*, and has learned a

    lot from other communities. It fundamentally changes the way we think and architect systems • You can see that it evolves and taps into the learning experiences of other languages & communities. It still lacks a mature ecosystem but it is gaining traction fast. • If you truly aim to invest yourself in it, you must be ready to get your hands dirty and write some infrastructure you take for granted in other ecosystems. • Do *not* underestimate the complexity of managing library code + tests + docs + versioning + bug tracking.
  74. CONCLUSION • THERE ARE other ways to solve all the

    issues I talked about here. {Erlang,Elixir} is *not* the only kid in the block … (see how many data-infrastructure projects are java based) • However, it is common to add *a lot* of extra infrastructure and moving parts in alternatives.
  75. CONCLUSION If you can solve “fault tolerance” & “distribution”, you

    have already solved “concurrency”& “scalability”.
  76. OBRIGADO !

  77. ESTAMOS CONTRATANDO (principalmente se vc manjar frontend)

  78. OBRIGADO !