Towards Stateful FaaS on Streaming Dataflow

7eedc0adb241f6762d9f7b90fd00b8d5?s=47 adilakhter
October 08, 2019

Towards Stateful FaaS on Streaming Dataflow

7eedc0adb241f6762d9f7b90fd00b8d5?s=128

adilakhter

October 08, 2019
Tweet

Transcript

  1. Berlin • 08 October 2019 Towards Stateful FaaS Flink Forward

    Marios Fragkoulis & Adil Akhter
  2. Outline

  3. • Motivation • Rho - FaaS beyond Stateless • Research

    Direction • Conclusion 3
  4. Motivation

  5. Web application DB Credit for icons: h9ps://www.fla?con.com 5

  6. Web application DB Operations Configura?on Programming model Credit for icons:

    https://www.flaticon.com 6 Debugging Failure handling Transaction management Fault tolerance Scaling Monitoring Deployment
  7. Web application DB IaaS PaaS SaaS Serverless FaaS Credit for

    icons: https://www.flaticon.com 7 Operations Configura?on Programming model Debugging Failure handling Transaction management Fault tolerance Scaling Monitoring Deployment
  8. < > MS Azure Functions Google Cloud func?ons Credit for

    logos: https://aws.amazon.com/lambda/ https://cloud.google.com/functions/ FaaS 8
  9. FaaS Fn Fn Fn Fn Fn Fn Fn Fn Fn

    Fn Fn Fn Fn Fn Fn Fn Fn Fn Fn Fn Fn Fn Fn Fn Cloud storage Credit for icons: https://www.flaticon.com 9 Managed infrastructure Function-based programming model ✅
  10. FaaS Fn Fn Fn Fn Fn Fn Fn Fn Fn

    Fn Fn Fn Fn Fn Fn Fn Fn Fn Fn Fn Fn Fn Fn Fn Cloud storage Stateless functions Stateful functions Fn-to-fn calls Coordination ❌ Credit for icons: h9ps://www.fla?con.com 10
  11. If not FaaS then what? 11

  12. 12 Services Architecture (1): Easiest Implem. Order Business Logic Stock

    Business Logic Payments Business Logic DB RPC Call RPC Response RPC Call RPC Response RPC Call RPC Response REST Call REST Call ▪ Perform an order iff there is stock available and the payment is cleared. ▪ Services are stateless ▪ Database does the heavy- liVing ▪ High latency, costly coordina?on calls
  13. 13 ▪ Make state of each service local to the

    business logic. ▪ Services now are stateful ▪ Low-latency access to local state ▪ Service calls s?ll expensive ▪ Not obvious how to scale this out. Services Architecture (2): Embedded State/DB Order DB Business Logic Stock DB Business Logic Payments DB Business Logic REST Call REST Call
  14. 14 ▪ Each message exchange/change to the state goes through

    an event-log. ▪ Services are asynchronous/reac?ve. ▪ If we lose state, we replay the log and rebuild it. ▪ Time-travel debugging, audits, etc. are trivial. Services Architecture (3): Event Sourcing Order DB Business Logic Stock DB Business Logic Payments DB Business Logic REST Call REST Call event-log event-log
  15. 15 Services Architecture (4): Scalable Deployment RPC Calls Subscribe for

    Responses event-log event-log Order 1 Business Logic Order 2 Business Logic Order 3 Business Logic Stock 1 Business Logic Stock 2 Business Logic Payment 1 Business Logic event-log DB DB DB DB DB DB
  16. 16 ▪ Millions of events per second on a couple

    of machines ▪ Consistent snapshots of state, with exactly-once guarantees ▪ We can scale-in/out operators to handle varying workloads Does it ring a bell? OP1 OP4 OP2 OP5 Input Message Queues OP3 Output Message Queue Apache Flink
  17. Stateful FaaS on Streaming Dataflow 17 Time-travel debugging using checkpoints

    and message broker Guaranteed message delivery and exactly-once processing Each operator executes a group of functions that share the same state Operator-local state partitioned on key input for scalability and fault-tolerance
  18. 18 Dataflow graphs could serve as a scalable backend for

    microservices and cloud applica?ons based on stateful func?ons. Dataflow graphs as a backend for Stateful Services
  19. Recap 19

  20. 20

  21. 21

  22. 22

  23. Built using largely ad-hoc, time- consuming, low-level programming.

  24. None
  25. 26

  26. 27

  27. ρ (rho) Function-as-a-Service beyond Stateless

  28. Programming Model

  29. Domain Codomain ⇒ λ I O ::

  30. 1. Pure Function (Fn) without State 2. Fn with State

    3. Fn that supports Orchestration Fn ≡ λ
  31. 32 def pingFn (p: Ping): Pong = Pong(id, "Pong!") Stateless

    λ
  32. 34 def pingFn (p: Ping, ctx: ExecutionContext[Pong]): λ[Pong] = ctx.returnWith(Pong(id,

    "Pong!"))
  33. Stateful λ 35 ‑ def sPingFn (p: Ping, ctx: ExecutionContext[Pong],

    state: PingCounter): λ[Pong] case class PingCounter( name: String, i: Long, allRequestString: Seq[String]) extends ManagedState def pingFn(p: Ping, ctx: ExecutionContext[Pong]): λ[Pong]
  34. 36 def sPingFn(p: Ping, ctx: ExecutionContext[Pong], state: PingCounter): λ[Pong] =

    { val message = ”Pong" for { p ← ctx.persist(Pong(p.shardId, s"$message at ${java.time.Instant.now()}")) } yield p }
  35. FnNamespace Every Fn (λ) is part of FnNamespace (can be

    considered as BoundedContext). 37 FnNamespace λ1 λ2 λn class PingServiceFn extends FnNamespace { override def descriptor: FnNamespaceDescriptor = named("PingService") .withQualifiedPaths( register("ping", pingFn _), register(”sPing", sPingFn _), register(”pingStat", pingStatFn _)) // rest of the implementation }
  36. State Management 38 FnNamespace λ1 λ2 λn

  37. 39 class PingServiceFn extends FnNamespace { type State = PingCounter

    def initialState: State = PingCounter("PingCounter", 0, Seq.empty) // rest of the implementation } FnNamespace λ1 λ2 λn
  38. 40 FnNamespace λ1 λ2 λn

  39. 41 def sPingFn(p: Ping, ctx: ExecutionContext[Pong], state: PingCounter): λ[Pong] =

    { val message = ”Pong" for { p ← ctx.persist(Pong(p.shardId, s"$message at ${java.time.Instant.now()}")) } yield p } Recall sPingFn
  40. class PingServiceFn extends FnNamespace { // … def onPersist: EventHandler[State]

    = { case (Pong(_, s), PingCounter(n, i, allRequests)) ⇒ PingCounter(n, i + 1, s +: allRequests) case (_, state) ⇒ state // Do nothing with the State } // rest of the implementation } 42 FnNamespace λ1 λ2 λn type EventHandler[S] = PartialFunction[(FnResponse, S), S]
  41. Putting it all together 43

  42. Execution Semantics

  43. 45 We strongly believe that streaming dataflows can have a

    central place in service-oriented architectures, taking over the execution of acid transactions, ensuring message delivery and processing, in order to perform scalable execution of services.
  44. 46 Compiler

  45. 47 FnNamespace1 FnNamespace2 FnNamespacen ReqQ1 ResQ1 ReqQ2 ResQ2 ResQn ReqQn

  46. 48 CLI Gateway FnNamspace 1 FnNamspace 2 FnNamspace N Events

  47. Recall PingServiceFn 49

  48. 50 1 2 3

  49. Orchestrator λ The provided programming abstraction supports calling another Fn

    (λ) from the same FnNamespace or from different FnNamespace available in the system. 51
  50. 52 OrderService PaymentSerivce StockSerivce reserveBalance prepareOrder

  51. 53 for { p ← ctx.callFn[PaymentFnRequest, PaymentFnResult]("PaymentFn.reserveCredit", pr) _ ←

    ctx.persist(p) // updating the state s ← ctx.callFn[PrepareStockRequest, StockFnResponse]("StockFn.prepareOrder", ps) _ ← ctx.persist(s) // updating the state } yield orderCreationResponse(r, p, s) OrderService PaymentSerivce StockSerivce reserveBalance prepareOrder
  52. 54

  53. 55 for { p ← ctx.callFn[PaymentFnRequest, PaymentFnResult]("PaymentFn.reserveCredit", pr) _ ←

    ctx.persist(p) // updating the state s ← ctx.callFn[PrepareStockRequest, StockFnResponse]("StockFn.prepareOrder", ps) _ ← ctx.persist(s) // updating the state } yield orderCreationResponse(r, p, s) OrderService PaymentSerivce StockSerivce reserveBalance prepareOrder
  54. 56 FnNamespace1 FnNamespace2 FnNamespacen ReqQ1 ResQ1 ReqQ2 ResQ2 ResQn ReqQn

    Recall
  55. 57 Rho in Action

  56. 58 OrderService PaymentSerivce StockSerivce reserveBalance prepareOrder

  57. $ rho deploy -n OrderFn --parallelism 2 --skip_jar $ rho

    deploy -n PaymentFn --parallelism 2 --skip_jar $ rho deploy -n StockFn --parallelism 2 --skip_jar 59
  58. 60

  59. 61

  60. 62

  61. 63

  62. Performance

  63. 65

  64. None
  65. 67 What’s Next?

  66. 68

  67. None
  68. Thanks

  69. 71 Online introductory course on stream processing starts on January

    15th, covering all fundamental stream processing concepts (?me, order, windows, joins, etc.). We are using Apache Flink for assignments & include invited talks from industry & academia. Enrollment is open. tudelft.nl/taming-big-data-streams Shameless plug by Asterios Katsifodimos @kasterios
  70. Questions ?

  71. 73 Adil Akhter Tech Lead at ING Amsterdam, The Netherlands

    http://coyoneda.xyz adilakhter
  72. References 1. “Stateful FuncCons as a Service in AcCon”, Adil

    Akhter, Marios Fragkoulis, Asterios Katsifodimos. In the Proceedings of the 45th Interna?onal Conference on Very Large Data Bases (VLDB) 2019. 2. OperaConal Stream Processing: Towards Scalable and Consistent Event-Driven ApplicaCons: Asterios Katsifodimos, Marios Fragkoulis. In the Proceedings of the 22nd Interna?onal Conference on Extending Database Technology (EDBT) 2019. 3. "Benchmarking Distributed Stream Data Processing Systems": Jeyhun Karimov, Tilmann Rabl, Asterios Katsifodimos, Roman Samarev, Henri Heiskanen, Volker Markl. In the Proceedings of the Interna?onal Conference on Data Engineering (ICDE) 2018. 4. “Efficient Window AggregaCon with General Stream Slicing”: Jonas Traub, Philipp M. Grulich, Alejandro Rodriguez Cuellar, Sebas?an Breß, Asterios Katsifodimos, Tilmann Rabl and Volker Markl. In the Proceedings of the 22nd Interna?onal Conference on Extending Database Technology (EDBT) 2019. 74
  73. 75