Scalability and resilience in practice: current trends and opportunities

50a17cd98aab2cc4d8e144741e11b1b7?s=47 Julien Ponge
October 03, 2019

Scalability and resilience in practice: current trends and opportunities

Presentation at the 38th International Symposium on Reliable Distributed Systems (SRDS 2019) in Lyon, France.

50a17cd98aab2cc4d8e144741e11b1b7?s=128

Julien Ponge

October 03, 2019
Tweet

Transcript

  1. SRDS Industrial Session October 2019 — Lyon, France Scalability and

    resilience in practice: current trends and opportunities Dr Julien Ponge Principal Software Engineer Dr Mark Little VP Middleware Engineering 1
  2. The following content reflects the views of the authors, not

    necessarily those of Red Hat. They do not constitute in any way a binding or legal agreement or impose any legal obligation or duty on Red Hat. This information is provided for discussion purposes only and is subject to change for any or no reason. 2 Disclaimer
  3. Today’s topic 3 Scalability + Resilience (in practice) ❌ Deep

    learning ❌ AI ❌ Dark mode ❌ Blockchain
  4. 4 Modern Distributed Systems “You can’t ignore the network anymore…”

  5. Sample use-case 5 " Walk 10k steps every day ⌚

    Wear a pedometer Be congratulated!
  6. 6

  7. 7 User profile service Activity service Ingestion service Public API

    User webapp Dashboard webapp Event stats service Congrats service Kafka topics MongoDB SMTP PostgreSQL AMQP HTTP HTTP HTTP HTTP ActiveMQ Artemis Event-driven micro-services
  8. Elasticity and application state 8 Persistent & replicated state Micro-service

    (or “function”) Events Streams Other services
  9. Elasticity and application state 9 Persistent & replicated state Micro-service

    (or “function”) Events Streams State boundaries + life-time Idem-potency? Other services
  10. Service mesh: connect, secure, control and observe 10

  11. Density is key 11 From https://kubernetes.io/docs/concepts/overview/what-is-kubernetes/

  12. 12 Reactive Systems “Searching for resource-efficiency”

  13. Reactive systems Reactive streams Reactive programming Reactive “Responding to stimuli”

    Manifesto, Actor, Messages Resilience, Elasticity, Scalability, Asynchronous, non-blocking Data flow Back-pressure Non-blocking Data flow Events, Observable Spreadsheets Akka, Vert.x Akka Streams, RxJava, Reactor, Vert.x Reactor, Reactive Spring, MS Excel, RxJava, Vert.x
  14. Reactive Manifesto 14 Message Driven Elastic Resilient Responsive Asynchronous, location-transparent

    Isolation, replication Start / stop instances Consistent latency
  15. 15 x 1000 = Async I/O to the rescue!

  16. 16

  17. Isolation and error management 17 Service Database Request ❌ Can

    we still provide a response? Cached data Default response …or just a timely error
  18. Damage control with a circuit breaker 18 Database Circuit breaker

    Closed Open Half-open fail (threshold reached) call reset timeout fail success success fail (below threshold)
  19. 19 Reactive toolkit for the JVM All kinds of distributed

    services Resource-friendly Fast
  20. 20 Related Research & Opportunities

  21. 21 Async is hard(er) (callback hell is just one facet)

    Image from https://adrianalonso.es/desarrollo-web/apis/trabajando-con-promises-pagination-promise-chain/
  22. Taming asynchronous operations 22 Promise / Future Reactive extensions Coroutines

    / fibers 1 item (or none) Hot / cold streams, back-pressure*, functional combinators Async disguised as regular imperative, rewritten as continuations while(stream.hasNext()) { stream.fetchNextElement() .then(this67storeInDb) .then(this67incrementDistributedCounter) .catch(this67handleError); } stream.toFlowable() .flatMap(db67store) .flatMap(distributedCounter67increment) .timeout(5, SECONDS) .retry(3) .subscribe(this67onNext, this67onError, this67onComplete()); try { while(stream.hasNext()) { item = stream.next(); db.store(item); distributedCounter.increment(); } } catch (Throwable err) { weHaveAProblem(err); } only with reactive-streams implementations, not in the original Erik Meijer paper *
  23. Language and runtime 23 Coroutines and reactive extensions do not

    solve all problems Asynchronous abstractions in programming languages remains an interesting topic! Deadlocks Soundness Expressiveness Back-pressure tuning Memory exhaustion Error handling (…)
  24. Compilation and runtime 24 JVM Open world assumption Speculative code

    generation Peak performance Needs more RAM Native images Closed world assumption No JIT compiler Boots fast Needs less RAM GraalVM from Oracle Labs (with Linz University and more) OpenJDK Backed by 20+ years of research
  25. Powered by and more!

  26. Distributed consensus 26 { active_users : 32168 } [123, “Lyon”,

    10000] Replicated data stores Discovery, global state, …
  27. Raft 27 “Paxos, but with bolts and nuts to implement

    it”
  28. Flexible Paxos (Heidi Howard) 28

  29. These are exciting times for research and practice in distributed

    systems! 29
  30. linkedin.com/company/red-hat youtube.com/user/RedHatVideos facebook.com/redhatinc twitter.com/RedHat Red Hat is the world’s leading

    provider of enterprise open source software solutions. Award-winning support, training, and consulting services make 
 Red Hat a trusted adviser to the Fortune 500. Thank you 30