payment is cleared. Services are stateless Database does the heavy- lifting High latency, costly state access 3 Microservices Architecture (1): Easiest Implem. Order Business Logic Stock Business Logic Payment Business Logic DB RPC C all RPC Response RPC C all RPC Response RPC Call RPC Response REST Call REST Call DB DB
Services now are stateful Low-latency access to local state Service calls still expensive Not obvious how to scale this out (e.g., shard the state in some way) Fault tolerance is hard 4 Microservices Architecture (2): Embedded State/DB Order DB Business Logic Stock DB Business Logic Payment DB Business Logic REST Call REST Call
Services are asynchronous/reactive. If we lose state, we replay the log and rebuild it. Time-travel debugging, audits, etc. are trivial. 5 Microservices Architecture (3): Event Sourcing Order DB Business Logic Stock DB Business Logic Payment DB Business Logic REST Call REST Call event-log event-log
Responses event-log event-log Order 1 Business Logic Order 2 Business Logic Order 3 Business Logic Stock 1 Business Logic Stock 2 Business Logic Payment 1 Business Logic event-log DB DB DB DB DB DB
some good news VM Fn Fn Fn VM Fn Fn Fn VM Fn Fn Fn Cloud database Managed Infrastructure (autoscaling!) Function-based programming model Stateless functions No State Fn-to-fn calls Transactions & orchestration among function calls
of each layer: Goal: maximum consistent order.checkout() throughput in the Cloud Quiz question: out of 40, 5-person teams, how many managed to sustain a deployment of 10K order chekcouts per second, and keep the data consistent? 10 Meanwhile at the TU Delft campus… Service Layer: Flask/Spring, AWS Lambdas, Azure Functions. Akka Persistence Layer: Postgres, CockroachDB, Mongo, Cassandra, Redis Infra Layer: Docker+Kubernetes on Amazon or Google Cloud
How to make stateful computations fault tolerant? How do we (or should we) guarantee message delivery? How do we consistently query the global state of a full system? What abstractions should people use? None! State Management is hard! And the current technology is primitive! (or the students have learned nothing)
Application State Querying and Consolidation Versioned Deployments/State/Schema Transactions & Service Orchestration Functions & communication compiled to efficient implementations Time-Travel Debugging Capabilities Holistic Optimization of Service Compositions
Fragkoulis Adil Akhter (ING Bank) Pedro Silvestre [1] Operational Stream Processing: Towards Scalable and Consistent Event-Driven Applications Asterios Katsifodimos, Marios Fragkoulis. In EDBT 2019. [2] Stateful Functions as a Service in Action Adil Akhter, Marios Fragkoulis, Asterios Katsifodimos. In VLDB 2019. Together with:
orchestration Function call Transaction failure handling Function arguments Function state Function state update *currently compiles to cyclic Flink+Kafka dataflows.
Router SVC1 SVC5 SVC2 SVC4 Input Message Queues SVC3 Control Event (commit, prepare, snapshot marker, etc.) Message Managed Operator State Output Message Queue Time-travel debugging using checkpoints and message broker Guaranteed message delivery and exactly-once processing Each operator executes a (group of) microservices or functions that share the same state Operator-local state partitioned on key input for scalability and fault-tolerance
were around. We need to fix this. How? • Talk to your Cloud provider • Support intitiatives like Cloudstate.io, Flink’s Stateful Functions, MS Orleans, CloudBurst, etc. • Talk to me if you want to know more about our plans!